An Exercise in Analytics: Using Microsoft SandDance to Visualize Trends at CraftConf

I’ve been lucky to attend every CraftConf Budapest since its inception in 2014.  It has always been a mind-expanding experience, with a healthy mix of established and newcomer speakers from the tech industry worldwide. Although Craft, as the name suggests, is focused on Software Craftsmanship, the specific topics of talks vary each year as industry trends fluctuate. I was interested in taking a deeper look at trends of technologies and paradigms as they become more or less popular over time, and to find those, if any, whose popularity has remained more or less constant.

While doing some research into data analytics for a project at work, I came across Microsoft SandDance. My interest in the CraftConf data was the perfect opportunity to teach myself SandDance (and some Python as well). So I put together a quick experiment: Using Martin’s excellent tutorial on Web Scraping with Python as a reference, I wrote  a Python script that scrapes CraftConf’s talk descriptions (using archives from 2014, 2015 and 2016) and produces a CSV file of the most frequently occurring words. Obviously standard English language words like pronouns, days of the week, and so on are ignored. What we’re left with is a list of top 100 words for each year, and their frequencies of occurrence, which can be visualized  in SandDance.

Although this approach is quick and more or less effective, the limitation is that it may not accurately reflect trends for phrases like “Agile Methodology” — the word frequency of “agile” may not be the same as that of “methodology”. But that’s something that can be worked on later. So although I would take this as a good indicator (which meets my purpose), I wouldn’t use the analysis outcomes as a serious reference.

What the Data Tells Us

Here are some interesting findings from the first pass (150 of the most frequent words selected, out of those 100 produced after filtering out common and punctuated words):

  • “Product” showed up 26 times in 2015 and nearly doubled to 51 in 2016 (a growing trend: no talk, some talk, twice the talk…)
  • “Functional” [programming] appeared 16 times in 2014, and not in the other 2 years (something that’s coming and going?)
  • Similarly, “Architecture” showed up respectively 15, 37 and 23 times (up and down)
  • “DevOps” was an equally hot trend in 2014 and 2015 but didn’t show up in 2016 (presumably because the hype is over)
  • “Microservices” appears 29 times in 2016, but didn’t show up in the previous years (so there is a recent spike in popularity)

 

CraftConf2016

Tim Steigert’s Closing Keynote at CraftConf 2016

See for Yourself

As a fun exercise in data visualization and trend analysis, I encourage you to try it out for yourself, using the CSV file produced by my script. To start with:

  • Load the dataset: Dataset > Web > CSV file (Keep “First line is header” checked)
  • Set the URL to the CSV file link above, and click Load
  • View as: Column
  • X Axis: Keyword
  • Sum by, Facet by: None
  • Color by: Keyword
  • Sort by: Frequency
  • Set the X axis bins to: 100

The typical way to drill down using SandDance would be:

  • Select a keyword (say “lean”)
  • Click Isolate. Everything else gets filtered out (note the “Filtered” count increased from 0 to 2)
  • Now you can check “Details”, or “Facet by…”, for example
  • To go back, simply click “Filtered” to clear the selection
CraftConfSanddanceArchAnalysis

Isolating the keyword “Architecture” in SandDance

Isolating the “Other” keyword will reveal a whole lot of keywords that don’t show up in the first 100 bins. You can also take the SandDance tour (by clicking on Tour) and discover many other interesting ways of playing with SandDance.

You can find the source code in my Git Repository WordFreqCount. If you find it useful, please feel free to reuse, derive from or improve it. Note that credit goes to Martin for the original code on web scraping using Python and Beautiful Soup, which I largely adapted from. And of course, thanks to Microsoft for making the elegant and powerful SandDance available for free!

“Agile Architecture”: Is Your Product’s Architecture Simply a Reflection of its Release Schedule?

After recently reading George Fairbanks’ Just Enough Software Architecture and obtaining 90% in my TOGAF 9.1 Foundation Certification exam in Enterprise Architecture, I’ve naturally been thinking about Software Architecture a lot. The other day a friend and I were talking about a typical scenario, involving large scale products with multiple sprint teams on a tight delivery schedule. I will over-simplify and over-generalize it here:

The Scenario

Let’s say you’re building a product which requires a feature that a snapshot can be exported to a PNG file. This is typically what would happen:

  • Product Owner: “As a user, I want to export snapshot to PNG, so that I can email it as an attachment to the Finance Department”
  • Story is added to Product Backlog, prioritized during the Sprint Grooming and estimated in the Sprint Planning
  • Developer Dinesh starts working on it, he adds the Export to PNG feature, it is demo’d at the end of the Sprint and everyone is happy

The feature and product iteration is shipped, customers start using it, the product team goes back to juggling new features and bugs. Priorities and sprint teams change. Somewhere down the line, another requirement comes up:

  • Product Owner: “As a user, I want to export snapshot to PDF, so that I can archive it in the Document Management system” (Don’t ask why the two systems can’t work with the same format).
  • Story is added, prioritized and estimated. This time though, Dinesh isn’t around (he’s either on vacation, promoted, no longer in the company or simply more focused on another aspect of the product)
  • Developer Gilfoyle starts working on this feature. From this point on, 3 things can happen:
    • Rearchitecture: An experienced developer or architect recognizes the potential of code reuse/refactor/rearchitecture between the two implementations. A common interface emerges, common functionality is moved up and specific functionality is moved to concrete classes. Note that:
      • Code is refactored
      • Some previously existing code is even thrown away
    • Technical Debt is accrued: Gilfoyle has limited time to implement the new feature; on the UI side the “Export to…” menu selections show up users would expect, but the internal implementation is disjoint. The code is not clean (but does the job), the Export functionality is not unified into a single interface and there is code duplication.
      •  In some of the stakeholders’ minds, code has been “reused” from the previous implementation, but in reality it has been copy-pasted (imitation being the best form of flattery towards Dinesh)
      • The team and PO might even recognize the Technical Debt and add it to the backlog, to be dealt with in the future at a lower priority (although that tends to lead towards a Bottomless Backlog)

In both cases, layers emerge in software over time as features are added and the product evolves. However, the first case has architectural layers and the second exhibits feature layers: In extreme cases, you might even be able to infer the Product Backlog by looking at the order in which features were added.

The Analysis

Some thoughts:

  • The Technical Debt Trap“: Doc Norton explained it very well at CraftConf 2016 (video link): (a) Most of what we have come to call “Technical Debt” is actually code cruft, and (b) “clean, testable code is a pre-requisite to being able to pay back Technical Debt”.
  • Show me the $: While a Product Owner may have limited influence over where budget is spent, a Product Manager may be in a better position to limit the long term negative effects of decisions made within the scope of individual sprints
  • One size does not fit all: Complex products or long development cycles, must be overseen by an experienced Architect or Development Lead, who advises the PO, PM and other decision makers
  • Stakeholders pay for the product, not the code: For stakeholders coming from a more traditional software engineering background, it may seem like throwing away (or refactoring) code is a questionable decision, because money was spent on producing it. This thinking must be erased by clarifying the benefits of a more robust and maintainable design.

The Solution (Maybe)

In summary: Features, as seen by the business, are not always the same as features, as seen by the developers. I’m tending to think the solution might be to maintain two backlogs: a business-oriented Product Backlog, and an architecture-oriented Technical Backlog. The latter would go into the details of how the business needs will be met at a technical level, driven by the objectives of overall cohesiveness, maintainability and constant reduction of Technical Debt. Here’s how I imagine it:2xBacklogs

Note that the technical effort to achieve a given amount of business features, is more than what it would be if the team worked off of a single backlog. This is simply because the Technical Backlog takes into account the additional effort of refactoring/rewriting, which is normally not covered in the typical Product Backlog.

I’m curious about what you think, please leave a comment below or get in touch on Twitter: @survivalcrziest.

(PS: Here’s a tip: VCS-based Software Analysis by Adam Tornhill)

Update: Prioritization of Items in the Technical Backlog

Based on some of the readers’ feedback, I would like to clarify how I intend the system to be used:

  1. The top of the Technical Backlog is prioritized / groomed based on (a) “Technical Value” and (b) budget and schedule constraints (“effort”).
    •  Technical Value here is defined as the value delivered by the technical solution towards meeting the needs of the [prioritized] Product Backlog.
  2. Items may be to the Technical Backlog not just based on the insertion of new features in the Product Backlog, but also based on technical reviews / retrospectives where it is identified that an architectural change is required (e.g. for purposes of maintainability or compliance).
  3. The items pushed to the bottom of the Technical Backlog thus represent Technical Debt.

 

Innovation is not a Linear Phenomenon: The Faraday FF Zero 1 Example

Innovation ‘R’ Us

Every company is trying to “innovate” these days… no matter how large or small. Some of the bigger ones are virtually pleading with their multinational workforce through challenges, awards and incentives to come up with the magic pill that will help the company sail through stormy waters (As Alf Rehn summarized it [1], “in April we innovate, in May we fire people”) .

The smaller ones… well, there are companies based entirely on nothing else but “an innovative solution” to do something you could already do before (but this time in a Javascript framework). Looking at it from the Lean Startup perspective, I find it a bit weak when a whole business is based on the USP of “innovative”. Your solution could be based on quality attributes [2] that make it faster, scalable, interoperable, customizable, streamlined, or really 100 other ways that could maximize value… the means to achieve these better be innovative, because that’s the very least your customer expects from you!

Have you ever heard a Formula 1 driver call himself fast? Or a firefighter call him/herself brave? Or a surgeon boast about how precise she is? No, because these are attributes that are inherently expected of them. If they weren’t fast or brave or precise, they wouldn’t last very long in their line of work. Similarly, today all technology companies are expected to be innovative in order to survive. You know who boosts their own ego publicly? Pop stars:

 

Innovation is… Not Where You Think it is

Now, about Faraday. Earlier this year at CES I picked up this leaflet from the Faraday FF Zero 1 booth. Since then, I have thought often of the part marked in red below:

Faraday-3

“SVP of R&D … spotted a drawing of a racecar on a designer’s desk and thought …”

BOOM. Innovation happened. Did the designer have a mandate to come up with a supercar? No. Was the SVP in an in offsite innovation workshop, brainstorming with other employees? No. Was there an innovation competition or challenge going on in the company, with an award at the end of it? Probably not. This is possibly the best example that innovation does not happen in an institutionalized manner. When successful innovation happens, it comes from the most unexpected places, more often than not driven by synergy, and it opens up a non-linear value proposition:

innovation_graph2

Original image credit: [3]

 

An Indicator of Innovation

These days some people are solving more problems with a Raspberry Pi over a weekend, than during a whole week of work in front of a corporate laptop. What can leaders do to harness this immense creative potential? I think the answer is to build an organization conducive to innovation, geared up to quickly change course when an innovation potential appears on the horizon, and… basically get out of the way. Easier said than done, you say… there are risks, budgets, stakeholders, possibly even (shudder) committees… no way this is going to work.

Which brings me to the final point: how deeply is trust rooted in the company’s culture? In Faraday’s example, the SVP trusted that something produced by one of his designers was potentially a big deal. It did not come up through a chain of committees and approvals. It happened through synergy. And while structure can stifle synergy, trust can help it thrive.

I therefore argue that the amount of trust in a company is a solid indicator of innovation potential. How much employees trust the leadership’s direction, how much coworkers trust each other (even across borders and timezones) and how much the leadership believes in the people they hired: these factors determine how likely synergistic events will be recognized, and nurtured into products or solutions that are called “innovative” by customers and competition, not just by the companies themselves.


[1] Alf Rehn, “How to Save Innovation From Itself”. Craft Conf 2016 talk.

[2] George Fairbanks, “Just Enough Software Architecture: A Risk-Driven Approach”. InfoQ interview and excerpt.

[3] MintViz.

Prezi’s 7 Year Itch?

This has been a really hard post for me to write, because some of the brilliant engineers and designers at Prezi are my professional acquaintances and/or good friends. And that’s the very reason I’m writing it – real friends tell it like it is. I’ve been putting this off since I first started using Prezi almost a year ago. But now that I have experienced it in free, Edu and Pro flavors, and created some pretty complex ones, the initial tiny niggles now seemed to have, let’s say, zoomed in (pun intended). So Prezi people: please consider this well-intended motivation.

Prezi was founded in 2009, and I imagine at that time it must have felt something like Wolfenstein 3D: technologically ground-breaking and in some circles, controversial (you may recall Microsoft tried to acquire id Software). Powerpoint needed to be killed, because it enforced linear thinking – and Prezi was the answer. Yet, 6 years later, linear thinking is still popular:

With 60 mn users (and 160 mn -/+ 18% Prezis created, depending on which part of their website you refer to), obviously they’ve got something right. Right enough that they are able to sustain an international presence on a single product. That rarely happens at this scale, and I suppose it is something to be admired.

The Platform

The in-browser experience is brilliant. Works well and virtually eliminates the entry barrier.  I tried using the desktop variant, Prezi for Mac, once. Just once. I found that it in terms of usability, it didn’t offer any major advantages over the website. But the real problem was that it kept false-flagging sync conflicts between the “local” and “cloud” versions. Eventually I decided the fake stress wasn’t worth it and switched back to the browser.

You may not have this dilemma if you would like to use the free power of Linux to share your ideas – Prezi doesn’t support it. (To be fair, they more than make up for this by actively promoting the tech and open source community in Budapest, by regularly organizing/hosting events, meetups and conferences).

There is an iPad version that “enables users to pan, and pinch to zoom in or out of their media” (That’s perfect, because that’s pretty much what an iPad is good for, anyway).

What I really wish for is the ability to resize the slide view pane on the left, so I can identify and jump to specific slides. And multi-select, especially for reordering (use the Force of the Shift Key). I would also like to mark some sections so I can focus on certain areas for preparation. Or even skip straight to one depending on how the conversation is flowing.

I mentioned stress earlier. You can quite easily recreate it by using 2-finger zoom gesture on a Mac. It’s not very precise and often ends up being a lot more than you’d expect. I once had to spend hours fixing my slides when a select-move-zoom operation didn’t go as planned. I learnt the hard way to stick to the zoom buttons and not use gestures.

And finally, it all runs on Flash. Yes, even now.

Features

The undo/redo feature also has a bit of a mind of its own. For simple operations it usually works fine, but when you’re in the flow and make a mistake, it often does the thing you’d least expect. Infinite undo/redo shouldn’t be that hard to implement either, considering everything is an object with attributes, anyway. What would be really nice, though, is some kind of checkpoints — major versions that you can roll back to, branch out of, or share with someone (e.g. numbered drafts). Because, 2015.

The guides when moving stuff around are hard to see. Too often they blend into the background, especially e.g. if the background has a grid. Get those ants marching again!

I also miss the ability to add notes, although what I would really expect is a more modern way of representing “prompts”, in a way that would be intuitive for presenters.

Warning: Brutal Honesty Ahead

Try this test, Prezi folks: create a new Prezi, single slide only, and export it to PDF.

  • Number of pages expected: 1
  • Number of pages found: 2.  The first page is always duplicated during export, regardless of number of slides.
  • Test result: FAIL.

Really, nobody noticed this in all these years?

Or that the image borders provided by the Aviary plugin are too thick and can’t be adjusted? (I trust you when you say my “private”, potentially proprietary/copyrighted images are truly private, even though there seems to be no obvious ToS or protection mechanism).

I’ve had Aviary and PDF export error out on me on a couple of occasions. I can live with that. But must you call them IO_ERRORs? That sounds more like the server ran out of disk space or something. And to non-programmers like my Dad, it probably looks like “ten errors” misspelt.

Nice to have: compressed PDF, with lowered quality. Try printing a 60 MB PDF some time, or re-uploading it — when just a link would have sufficed.

For that matter, why limit social sharing options to only to Facebook? How about Twitter, LinkedIn and dare I say, SlideShare integration as well? (PS: Do people really share presentations on Facebook? How about a good old “Email this” shortcut?)

There is tremendous potential to use Prezi as a content generation tool if it allowed exporting to video, or directly to YouTube (and/or Vimeo). Think tutorials, troubleshooting guides, walkthroughs and just plain not having to talk over and over again — present it, record it, share it. Other people seem to have the same idea, too.

And if only there was a way to present to the second screen, so in a presenter view (or just Edit mode) I can see what’s coming next and how I’m doing with my timing.

Use Your Imagination

sod2

Animated GIFs are still the quickest way of demonstrating processes in action, for example. Importing Google Spreadsheets and converting them into graphs within the editor. Powerful timelines like TimeGlider. Simple maps like OpenHeatMap. Leveraging the zooming and object hierarchy features to build a MindMap mode (which may even serve as a foundation for the final Prezi). And why limit the zooming and rotating to 2 dimensions only? I would love to make a Prezi with 6 slides, each represented on a face of a cube which can be rotated in 3D space. After all, the future of storytelling is going to be virtual and immersive.

(Here’s an idea: API and plugins).

Prezi is great, but that’s no reason to not be even greater.

(Or just release the Kraken: make it open source, let us fork it and watch how people use the platform to build even more clever ideas for telling their stories).