An Exercise in Analytics: Using Microsoft SandDance to Visualize Trends at CraftConf

I’ve been lucky to attend every CraftConf Budapest since its inception in 2014.  It has always been a mind-expanding experience, with a healthy mix of established and newcomer speakers from the tech industry worldwide. Although Craft, as the name suggests, is focused on Software Craftsmanship, the specific topics of talks vary each year as industry trends fluctuate. I was interested in taking a deeper look at trends of technologies and paradigms as they become more or less popular over time, and to find those, if any, whose popularity has remained more or less constant.

While doing some research into data analytics for a project at work, I came across Microsoft SandDance. My interest in the CraftConf data was the perfect opportunity to teach myself SandDance (and some Python as well). So I put together a quick experiment: Using Martin’s excellent tutorial on Web Scraping with Python as a reference, I wrote  a Python script that scrapes CraftConf’s talk descriptions (using archives from 2014, 2015 and 2016) and produces a CSV file of the most frequently occurring words. Obviously standard English language words like pronouns, days of the week, and so on are ignored. What we’re left with is a list of top 100 words for each year, and their frequencies of occurrence, which can be visualized  in SandDance.

Although this approach is quick and more or less effective, the limitation is that it may not accurately reflect trends for phrases like “Agile Methodology” — the word frequency of “agile” may not be the same as that of “methodology”. But that’s something that can be worked on later. So although I would take this as a good indicator (which meets my purpose), I wouldn’t use the analysis outcomes as a serious reference.

What the Data Tells Us

Here are some interesting findings from the first pass (150 of the most frequent words selected, out of those 100 produced after filtering out common and punctuated words):

  • “Product” showed up 26 times in 2015 and nearly doubled to 51 in 2016 (a growing trend: no talk, some talk, twice the talk…)
  • “Functional” [programming] appeared 16 times in 2014, and not in the other 2 years (something that’s coming and going?)
  • Similarly, “Architecture” showed up respectively 15, 37 and 23 times (up and down)
  • “DevOps” was an equally hot trend in 2014 and 2015 but didn’t show up in 2016 (presumably because the hype is over)
  • “Microservices” appears 29 times in 2016, but didn’t show up in the previous years (so there is a recent spike in popularity)

 

CraftConf2016

Tim Steigert’s Closing Keynote at CraftConf 2016

See for Yourself

As a fun exercise in data visualization and trend analysis, I encourage you to try it out for yourself, using the CSV file produced by my script. To start with:

  • Load the dataset: Dataset > Web > CSV file (Keep “First line is header” checked)
  • Set the URL to the CSV file link above, and click Load
  • View as: Column
  • X Axis: Keyword
  • Sum by, Facet by: None
  • Color by: Keyword
  • Sort by: Frequency
  • Set the X axis bins to: 100

The typical way to drill down using SandDance would be:

  • Select a keyword (say “lean”)
  • Click Isolate. Everything else gets filtered out (note the “Filtered” count increased from 0 to 2)
  • Now you can check “Details”, or “Facet by…”, for example
  • To go back, simply click “Filtered” to clear the selection
CraftConfSanddanceArchAnalysis

Isolating the keyword “Architecture” in SandDance

Isolating the “Other” keyword will reveal a whole lot of keywords that don’t show up in the first 100 bins. You can also take the SandDance tour (by clicking on Tour) and discover many other interesting ways of playing with SandDance.

You can find the source code in my Git Repository WordFreqCount. If you find it useful, please feel free to reuse, derive from or improve it. Note that credit goes to Martin for the original code on web scraping using Python and Beautiful Soup, which I largely adapted from. And of course, thanks to Microsoft for making the elegant and powerful SandDance available for free!

Craft Conf 2015, Day 3

Continued from Day 2, here is a summary of talks I attended on Day 3:

From the Monolith to Microservices: Lessons from Google and eBay

By Randy Shoup | Video | Slides

Another eye-opening presentation with valuable insights, such as the fact that [a big organization like Google] doesn’t need architects, it just needs standardized communication and standardized interfaces. And that one of the biggest mistakes people make with microservices is reflecting the provider’s model instead of the consumer’s model. I highly recommend his talk, because it is based on the analysis of several Silicon Valley giants, successful either in the past or the present.

Interaction Driven Design

By Sandro Mancuso | Video | Slides

Sandro’s presentation was full of real examples rather than just theory. I had never heard of the Walking Skeleton before. It was an interesting intersection of DDD (Domain Driven Design), MVC-type architectures and SOLID principles, leading up to a pragmatic way of structuring and packaging software projects. Other advice from Sandro included modeling behavior, not state and not necessarily representing repositories as first-class citizens.

WebSocket for the Real-Time Web and the Internet of Things

By Peter Moskovits | Video | Slides

Not only was it an amazing presentation with live demos, Peter was also fully prepared with a backup plan for everything – including a PDF version of his presentation. After a historical perspective & technical explanation of how WebSockets work, he jumped into Kaazing demos which you can also experience online here. The most interesting was a kind of MVP for disseminating airline telemetry data (here).

Why Is An API Like a Puppy?

By Ade Oshineye | Video | Slides

RESTful APIs are not the solution to all of the world’s problems: Ade was short, succinct and insightful. The title of his talk reflected the fact that an API is an expensive long term commitment, it’s not just about the initial cost of software development. He got a lot of attention when he revealed that Google’s most successful API is AdWords, and it’s SOAP, not REST. Although REST is theoretically good, it doesn’t usually fit well with the real world consumer’s way of thinking. Another one of his gems was that if your [public] API is not being spammed/abused, then either no one is using it, or it’s happening and you’re not aware of it.

Implementing the Saga Pattern

By Caitie McCaffrey | Video | Slides

There wasn’t anything interesting to me during this time slot, so I decided to go with this one just for the Halo reference. There was just one picture of Halo. And a lot of “so” and “like”.

Techniques and Tools For a Coherent Discussion About Performance in Complex Architectures

By Theo Schlossnagle | Video | Slides

Theo decided it would be a good idea to plaster all his slides with huge pictures of steak. Anyway, after establishing that User Experience is measured in milliseconds, and that performance is also about the time spent between service layers, he covered distributed tracing systems such as Dapper and Zipkin.

IMG_3480

Great Engineering, Failed Product

By Marty Cagan | Video | Slides

Marty drew on decades of experience in Silicon Valley to summarize why great products and companies fail over and over again. I highly recommend watching his inspiring and insightful talk. Some of the things he touched upon while comparing successful and poorly performing teams:

  • Customers and company executives are a bad source of product ideas, because they don’t know what’s technically achievable
  • Developers are a good source, and so is Data (analytics, metrics, usage)
  • Multi-billion dollar projects are not based on a Business Case accurately predicting future revenue
  • Roadmaps are not a good indicator because Customers have other options available to them
  • Think Time to Money, not Time to Market – which means more than one iteration is involved
  • Product Managers are not mere [user] story writers – they need to have a deep understanding of the business, industry, customers and constraints
  • Most teams work in a way that gives them probably 20% of the benefit of Agile Methodologies
  • Value outcomes over output; think in terms of results, not projects
  • Successful teams run as many as 20 MVP experiments in a week – even if it involves hardware
  • Successful companies use an OKR approach to measure progress
  • The four product development questions:
    1. Will the customers choose it? (Customer Validation)
    2. Will they be able to use it? (User Experience)
    3. Can we build it? (Feasibility)
    4. Can our stakeholders support it? (e.g. Legality)

______________________________________________

Craft Conf 2015, Day 2

I had the privilege of attending the second year of CRAFT, a tech conference in Budapest focused on software craftsmanship. The last year (which was the first time it was held) had completely blown my mind. A year later I still keep referring back to the talks and haven’t finished fully absorbing them and putting all those inspiring ideas into practice.

IMG_3467

In short…

Craft Conf 2014 was better. The speakers came from a more diverse background, the talks spanned a multitude of unrelated topics and I remember it being very, very hard to choose from talks happening in parallel. Each minute spent there was a revelation.

This year, though, many of the talks seemed to be plug for a company or a product, in disguise. Certainly there were brilliant takeaways, but not at the same scale as the previous year.

In my opinion, 2014 was also held in a better venue, although the 2015 venue was outstanding too, as far as tech conference venues go. But the rooms were too far spread out (the map was inaccurate), the acoustics were bad everywhere except the Main Room and unlike 2014, the WiFi was not flawless. Lastly, there were far fewer food choices, longer queues, no bottled water (even for the speakers) and therefore a lot of glasses clanking.

On the positive side, the schedule was followed down to the minute, the live video streaming was smooth and considering the scale of the event (1300 attendees), everything was beautifully organized. I’m not complaining – it’s just that the first CRAFT had set a pretty high standard.

(Video and Slides links will be updated by next weekend, when they become available)

Agile Engineering in a Safety-Critical World

By Nancy Van Schooenderwoert | Video | Slides

“Instead of freezing the ocean, learn to ride the waves” – Nancy’s talk was mainly about how our need for predictability for effective coordination is at odds with our need for fast learning to handle unknowns. She pointed out that in the agile context, “Architecture is any design decision that you cannot easily change”.

There was the customary reference to WikiSpeed to dispel the myth that hardware changes can’t fit within 2-4 week iterations. And an interesting one to a paper called TIR45 from AAMI: Guidance on the use of AGILE practices in the development of medical device software.

Coding Culture

By Sven Peters | Video | Slides

Sven’s talk was both informative and inspiring. Some of the key takeaways:

  • Innovation needs time
  • Stop and celebrate wins, however small they may be
  • Balance your passion for code with your passion for customers
  • Turn your passion into product
  • Value trust, autonomy and transparency (Atlassian achieves this by using chat over other communication means)
  • Products come and go, culture stays

Take a look at Atlassian’s Mood App and Stash Reviewer Suggester.

Building Reliable Distributed Data Systems

By Jeremy Edberg | Video | Slides

This one was good, until we went deep diving into the NetFlix Simian Army, which was also good but could have been summarized in just one slide. One thing that stood out from Jeremy’s advice was to “build for three”, because if you can overcome problems there then the solution can be [more] easily scaled up to n.

Don’t forget to check out NetFlix Open Source Software Center.

Oh! You Pretty Tools

By Andrew Bayer | Video | Slides

Andrew gave an interesting talk about the role of internal tools and their developers in the organization, covering both the pros and the cons. For example while making the build or buy decision, consider the fact that people are more expensive than software. And some thoughtful insights, like how Integration Tests can double-up as the roadmap to your tool’s usage. He also revealed that Cloudera runs ~2000 Jenkins CI builds every day(!)

The rest of it was basically about, and lessons learned from, CloudCat.

Testing and Integration (The Remix)

By Ines Sombra | Video | Slides

Ines entertainingly summarized everything we know so far, and topped it off with new insights for a good measure.  She emphasized:

  • The importance of lightweight short-lived branches so that CI is not overlooked
  • The more likely a test is to fail, the sooner you should run it
  • The testing of provisioning systems, such as Chef Recipes, too
  • How test setup time and parallelization are the key factors in minimizing the testing cycle time

She recommended this talk about the Google Build System.

Her punchline was that CI is a predictor of professional maturity at the organizational and individual level, and she ended with a “rantifesto” about building a culture of quality.

Beyond Features: Rethinking Agile Planning and Tracking

By Dan North | Video | Slides

From Cutting to Curing: Dan presented the powerful and inspiring idea that maybe software engineering is more like surgery than the civil engineering principles that we currently use to manage it. Agile methodologies essentially optimize for predictability, and this not necessarily a good thing. He mused on how a 2-week sprint is just enough time for a mini-waterfall, and thus we are all basically whitewater rafting.

After reviewing where the Agile Manifesto has brought us, he set an ambitious new goal to sustainably minimize the lead time to business impact. 

He ended with:

  • The role of Features, Delivery and Kaizen
  • Schedule, Measure, Track, Showcase
  • How Value Stream Mapping can reveal surprises like typically a piece of work spends upto 90% of it’s time waiting for dependencies

How To Save Innovation From Itself

By Alf Rehn | Video | Slides

For me, Alf’s talk was the highlight of the event. It was so good and so inspiring that I won’t even summarize it here. Go watch it!

______________________________________________

The day ended with a party thrown by EPAM, which included free beer, a DJ-saxophone duo and a surprise flashmob.

You may also want to read my review of Day 3 of Craft Conf 2015.