An Exercise in Analytics: Using Microsoft SandDance to Visualize Trends at CraftConf

I’ve been lucky to attend every CraftConf Budapest since its inception in 2014.  It has always been a mind-expanding experience, with a healthy mix of established and newcomer speakers from the tech industry worldwide. Although Craft, as the name suggests, is focused on Software Craftsmanship, the specific topics of talks vary each year as industry trends fluctuate. I was interested in taking a deeper look at trends of technologies and paradigms as they become more or less popular over time, and to find those, if any, whose popularity has remained more or less constant.

While doing some research into data analytics for a project at work, I came across Microsoft SandDance. My interest in the CraftConf data was the perfect opportunity to teach myself SandDance (and some Python as well). So I put together a quick experiment: Using Martin’s excellent tutorial on Web Scraping with Python as a reference, I wrote  a Python script that scrapes CraftConf’s talk descriptions (using archives from 2014, 2015 and 2016) and produces a CSV file of the most frequently occurring words. Obviously standard English language words like pronouns, days of the week, and so on are ignored. What we’re left with is a list of top 100 words for each year, and their frequencies of occurrence, which can be visualized  in SandDance.

Although this approach is quick and more or less effective, the limitation is that it may not accurately reflect trends for phrases like “Agile Methodology” — the word frequency of “agile” may not be the same as that of “methodology”. But that’s something that can be worked on later. So although I would take this as a good indicator (which meets my purpose), I wouldn’t use the analysis outcomes as a serious reference.

What the Data Tells Us

Here are some interesting findings from the first pass (150 of the most frequent words selected, out of those 100 produced after filtering out common and punctuated words):

  • “Product” showed up 26 times in 2015 and nearly doubled to 51 in 2016 (a growing trend: no talk, some talk, twice the talk…)
  • “Functional” [programming] appeared 16 times in 2014, and not in the other 2 years (something that’s coming and going?)
  • Similarly, “Architecture” showed up respectively 15, 37 and 23 times (up and down)
  • “DevOps” was an equally hot trend in 2014 and 2015 but didn’t show up in 2016 (presumably because the hype is over)
  • “Microservices” appears 29 times in 2016, but didn’t show up in the previous years (so there is a recent spike in popularity)



Tim Steigert’s Closing Keynote at CraftConf 2016

See for Yourself

As a fun exercise in data visualization and trend analysis, I encourage you to try it out for yourself, using the CSV file produced by my script. To start with:

  • Load the dataset: Dataset > Web > CSV file (Keep “First line is header” checked)
  • Set the URL to the CSV file link above, and click Load
  • View as: Column
  • X Axis: Keyword
  • Sum by, Facet by: None
  • Color by: Keyword
  • Sort by: Frequency
  • Set the X axis bins to: 100

The typical way to drill down using SandDance would be:

  • Select a keyword (say “lean”)
  • Click Isolate. Everything else gets filtered out (note the “Filtered” count increased from 0 to 2)
  • Now you can check “Details”, or “Facet by…”, for example
  • To go back, simply click “Filtered” to clear the selection

Isolating the keyword “Architecture” in SandDance

Isolating the “Other” keyword will reveal a whole lot of keywords that don’t show up in the first 100 bins. You can also take the SandDance tour (by clicking on Tour) and discover many other interesting ways of playing with SandDance.

You can find the source code in my Git Repository WordFreqCount. If you find it useful, please feel free to reuse, derive from or improve it. Note that credit goes to Martin for the original code on web scraping using Python and Beautiful Soup, which I largely adapted from. And of course, thanks to Microsoft for making the elegant and powerful SandDance available for free!


June 2013

Something I wrote to teach myself Perl and the fundamentals of REST API, in this case that of Twitter. It essentially retrieves a list of users for each of my Twitter Lists.



This started out as a side project between a couple of us at work to sharpen our C#/XML skills and generally do something challenging. A C#.NET Application + Service that enables browsing files on disk by Tag (as opposed to Path). Typical usage:

  • A Tag Cloud for your hard disk
  • Add/remove tag(s) to selected file(s)
  • List & Filter files by tag(s)
  • Monitor specified folders (ex: My Documents) for new/deleted/renamed files
  • Portable XML Database

Useful for Researchers, Writers and Obsessive-Compulsive File Hoarders.

DOPE: Distributed OPerating Environment

November 2003

I did this project with a friend around that time that Grid Computing was getting popular in the mainstream with SETI@Home (in its pre-BOINC avatar).

A Client-Server Java app that used Distributed Computing principles to:
a) Break down a given, computationally intensive task into chunks*
b) Distribute these chunks to clients on the network, favouring those with lower latency
c) Re-assemble the results returned by clients into a single unit
d) Re-send chunks for which result was not received within stipulated time

*Limitations apply: this was not a general-purpose implementation


In this section I’ll write about some of my side projects, specifically the ones that I have managed to bring to completion (without getting distracted by other shiny things), or interesting ones that I’m currently working on.