LEGO and Software – Part Roles

This is the fifth post in a series exploring LEGO® as a Metaphor for Software Reuse. A key consideration for reuse is the various roles that components can play when combined or re-combined in sets. Below we’ll explore how we can use data about LEGO parts and sets to understand the roles parts play in sets.

I open a number of lines of investigation, but this is just the start, rather than any conclusion, of understanding the roles parts play and how that influences outcomes including reuse. The data comes from the Rebrickable data sets, image content & API and the code is available at https://github.com/safetydave/reuse-metaphor.

Hero Parts

Which parts play the most important roles in sets? Which parts could we least easily substitute from other sets?

We could answer this question in the same way as we determine relevant search results from documents, for instance with a technique called TFIDF (term frequency-inverse document frequency). We can find hero parts in sets with set frequency-inverse part frequency, which in the standard formulation requires a corpus of parts “documents” listing sets “terms” for each set that includes that part, as below.

part 10190: "10403-1 10403-1 10404-1 10404- ... "
part  3039: "003-1 003-1 003-1 003-1 021-1  ... "
part  3023: "021-1 021-1 021-1 021-1 021-1  ... "

Inverse part frequency is closely related to the inverse of the reuse metric from part 4, hence we can expect it will find the least reused parts. Considering again our sample set 60012-1, LEGO City Coast Guard 4×4, (including 4WD, trailer, dingy, and masked and flippered diver), we find the following “hero” parts.

Gallery of hero parts from LEGO Coast Guard set (60012-1) including stickers, 4WD tyres, a dinghy, flippers and mask

This makes intuitive sense. These “hero parts” are about delivering on the specific nature of the set. It’s much harder to substitute (or reuse other parts) for these hero parts – you would lose something essential to the set as it is designed. On the other hand, as you might imagine, the least differentiating parts (easiest to substitute or reuse alternatives) overlap significantly with the top parts from part 4. Note while mechanically – in a sense of connecting parts together – it may not be possible to replace these parts, these parts don’t do much to differentiate the set from other sets.

Gallery of least differentiated parts from LEGO Coast Guard set (60012-1) including common parts like plates, tiles, blocks and slopes.

Above, we consider sets as terms (words) in a document for each part. We can also reverse this by considering a set as a document, and included parts as terms in that document. Computing this part frequency-inverse set frequency measure across all parts and sets gives us a sparse matrix.

Visualisation of TFIDF or part-frequency-invers-set-frequency as a sparse 2D matrix for building a search engine for sets based on parts

This can be used as a search engine to find the sets most relevant to particular parts. For instance, if we query the parts "2431 2412b 3023" (you can see these in Recommended Parts below), the top hit is the Marina Bay Sands set, which again makes intuitive sense – all those tiles, plates, and grilles are the essence of the set.

Recommended Parts

Given a group of parts, how might we add to the group for various outcomes including reuse? For instance, if a new set design is missing one part that is commonly included with other parts in that design, could we consider redesigning the set to include that part to promote greater reuse?

A common recommendation technique for data in the shape of our set-part data is Association Rule Learning (aka “Basket Analysis”), which will recommend parts that are usually found together in sets (like items in baskets).

An association rule in this case is an implication of the form {parts} -> {parts}. Multiple of these rules form a directed graph, which we can visualise. I used the Efficient Apriori package to learn rules. In the first pass, this gives us some reasonable-looking recommendations for many of the top parts we saw in part 4.

Visualisation of discovered association rules as a directed graph showing common parts

You can read this as the presence of 2431 in a set implies (recommends) the presence of 3023, as does 2412b, which also implies 6141. We already know these top parts occur in many sets, so it’s likely they occur together, but we do see some finer resolution in this view. The association rules for less common parts might also be more insightful; this too may come in a future post.

Relationships Between Parts

How can we discover more relationships between parts that might support better outcomes including reuse?

We can generalise the part reuse analysis from part 4 and the techniques above by capturing the connections between sets and parts as a bipartite graph. The resultant graph contains about 63,000 nodes – representing both parts and sets – and about 633,000 edges – representing instances of parts included in sets. A small fragment of the entire graph, based on the flipper part 10190, the sets that include this part, and all other parts included in these sets, is shown below.

Visualisation of set neighbours (count 79) of flipper 10190 and their part neighbours (count 1313) as two parallel rows of nodes with many connections between them
Visualisation of selected set neighbours (count 3) of flipper 10190 and selected of their part neighbours (count 14) as two parallel rows of nodes with some connections between them

This bipartite representation allows us to find parts related by their inclusion in LEGO sets using a projection, which is a derived graph that only includes parts nodes, linked by edges if they share a set. In this projection, our flipper is directly linked to the 1312 other parts with which it shares any set in this projection.

Visualisation of 1312 immediate neighbours of flipper 10190 in the part projection of the set-part graph, shows only 1% of connections but this is very dense nonetheless

You can see this is a very densely connected set of parts, and more so on the right side, from 12 o’clock around to 6 o’clock. We could create a similar picture for each part, but we can also see the overall picture by plotting degree (number of connections to parts with shared sets) for all part, with a few familiar examples.

Degree of nodes in part projection, with plate 1x2, slope 45 2x2 and flipper highlighted. Steep drop-off from maximum and long flat tail

This is the overall picture of immediate neighbours, and it shows the familiar traits of a small number of highly connected parts, and a very long tail of sparsely connected parts. We can also look beyond immediate neighbours to the path(s) through the projection graph between parts that don’t directly share a set, but are connected by common parts that themselves share a set. Below is one of the longest paths, from the flipper 10190 to multiple Duplo parts.

Visualisation of a path through the part connection graph spanning 7 nodes, with some neighbouring nodes also shown and parts drawn on

With a projection graph like this, we could infer that parts that are designed to be used together are closer together. We could use this information to compile groups of parts to specific ends. Given some group of parts, we could (1) add “nearby” missing parts to that group to create flexible foundational groups that could be used for many builds, or we could (2) add “distant” parts that could allow us to specialise builds in particular directions that we might not have considered previously. In these cases, “nearby” and “distant” are measured in terms of the path length between parts. There are many other ways we could use this data to understand part roles.

(When I first plotted this, I thought I had made a mistake, but it turns out there are indeed sets including regular and Duplo parts, in this case this starter kit.)

The analysis above establishes some foundational concepts, but doesn’t give us a lot of new insight into the roles played by parts. The next step I’d like to explore here is clustering and/or embedding the nodes of the part graph, to identify groups of similar parts, which may come in a future post.

Lessons

As I said above, there are no firm conclusions in this post regarding reuse in LEGO or how this might influence our view of and practices around reuse in software. However, if we have data about our software landscape in a similar form to the set-part data we’ve explored here, we might be able to conduct similar analyses to understand the roles that reusable software components play in different products, and, as a result, how to get better outcomes overall from software development.

Coming Next

I think the next post, and it might just be the last in this series, is going to be about extending these views of the relationships between parts and understanding how they might drive or be driven by the increase in variety and specialisation discussed in part 2.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

LEGO and Software – Part Reuse

This is the fourth post in a series exploring LEGO® as a Metaphor for Software Reuse. The story is evolving as I go because I keep finding interesting things in the data. I’ll tie it all up with the key things I’ve found at some point.

In this post we’re looking from the part perspective – the reusable component – at how many sets or products it’s [re]used in, and how this has changed over time. All the analysis from this post is available at https://github.com/safetydave/reuse-metaphor.

Inventory Data

The parts inventory data from the Rebrickable data sets, image content & API gives us which parts appear in which sets, in which quantities and colours. For instance, if we look at the minifigure flipper from part 3, we can chart its inclusion in sets as below.

The parts inventory data includes minifigures. I may later account for the effect of including or excluding minifgures in these various analyses. We can also tell if a part in a set is a spare; according to the data, 2% of LEGO parts in sets are spares.

Parts Included in Sets

The most reused parts are included in thousands of sets. Here is a gallery of the top 25 parts over all time – from number 1 Plate 1 x 2 (which is included in over 6,200 sets), to our favourite from last time, the venerable Slope 45° 2 x 2 (over 2,700 sets).

These parts appear in a significant fraction of sets, but we can already see a 50% reduction in the number of sets in just the top 25 used parts. Beyond this small sample, we can plot the number of sets that include a given part (call it part set count), as below.

This time we have used log scales on both axes, due to the extremely uneven distribution of part set counts. In contrast to the thousands of sets including top parts, a massive 60% of parts are included in only one set and only 10% of parts are included in more than ten sets. The piecewise straight line fit indicates a power law applies, for instance approximately count = 10000 / sqrt(rank) for the top 100 ranked parts.

Uneven distributions are often expressed in terms of the Pareto principle or 80/20 rule. If we define reuse instances as every time a part is included in a set, after the first set, then we can plot the contribution of each part to total reuse instances and see if this is more or less uneven that the 80/20 rule.

This shows us that reuse of LEGO parts in sets is much more uneven than the 80/20 rule. While the 80/20 rule says 80% of reuse would be due to 20% of parts, in fact we find by this definition that 80% of reuse is due to only 3% of parts, and 20% of parts account for 98% of reuse!

We find a similar phenomenon if we consider the quantities of parts included in sets, rather than just the count of sets per part. We could repeat the whole analysis based on quantities (and the notebook has some options for doing this), but I was fairly satisfied the results would be similar given the degree of correlation we find, below.

I was intrigued, though, by the part that appeared many times in a single set, then never again (the uppermost point of the lower left column). It turns out it is a Window 1 x 2 x 1 (old type) with Extended Lip included in a windows set from 1966 that probably looked a bit like this one.

This brings us neatly to the “long tail” to round out this view of reuse as parts included in multiple sets. As per the distribution above, the tail (of parts that belong to only one set) is really long and full of curiosities. The tail is over 22,000 parts long, though these belong to only about 10,000 unique sets. The parts belong about 60/40 to sets proper vs minifigure packs. Here’s a tail selection – you’ll see they are fairly specialised items like stickers, highly custom parts, minifigure items with unique designs, and even trading cards!

There’s even, perhaps my nemesis, a minifigure set called “Chainsaw Dave”!

Parts Included in Sets Over Time

Part reuse might vary with time, and might be dependent on time. In previous posts we’ve seen an exponential increase in new parts and sets over time and an exponential decay in part lifetimes. We can plot part reuse (set count) against lifespan, as below.

This shows some correlation – which we might expect as longievity is due to reuse and vice versa – but I was also intrigued by the many relatively short-lived parts with high set counts (in the mid-top left region). Colouring points by the year released shows that these are relatively recent parts (at the yellow end of the spectrum). This shows that, as well as long-lived parts, more recent parts also appear in many sets, which is good news for reuse.

However, it’s hard to determine the distribution of set count from the scatter plot, and hence how significant the reuse is. We can see the distribution better with a violin plot, which shows the overall range (‘T’ ends), the distribution of common values (shading), and the median (cross-bar), much like the box plot.

We see that although many parts released in the last few decades are reused in 100s or even 1000s of sets, the median or typical part appears in only a handful of sets. With 100s to 1000s of sets released each year recently, the sets are reusing both old and new parts, but the vast majority of parts are not significantly reused.

Top Parts for Reuse Over Time

Above, we introduced the top 25 parts by set count, based on all time set count, but how has this varied over time? We can chart – for each of the top 25 parts – how many sets they appeared in each year. However, as the number of sets released each year has increased exponentially, we see a clearer pattern if we chart the proportion of sets released each year including the top 25 parts, as below.

This shows variation in proportional set representation for the top parts from about 10% to 50% over the last 40 years. However, there was a very noticeable drop around the year 2000 to a maximum of only 20% of sets including top reused parts. Interestingly, this corresponds to a documented period of financial difficulty for The LEGO Group, but further research would be required to demonstrate any relationship. Prior to 1980, the maximum proportional representation was even higher, indicating a major intention of sets was to provide reusable parts.

From 2000 onwards, the all time ranking becomes more apparent, as the lines generally spread to match the rankings. We can also see this resolution of ranks in a bump chart for the top 4 parts over the time period from just before 2000 to the present.

Lessons for Software

As in previous posts, here’s where I speculate on what this data-directed investigation of LEGO parts and sets over the years might mean for software development. This is only on the presumption that – as often cited informally in my experience – LEGO products are a good metaphor for software. I take this as a given for this series, and as an excuse to play with some LEGO data, but I don’t really test it.

Given the reuse of LEGO parts across sets and time, we might expect that for software:

  • Most reuse will likely come from a small number of components, and this may be far more extreme than the 80/20 heuristic
  • If this is the case, then teams should build for use before reuse, or design very simple foundational components (see the top 25) for widespread reuse
  • Building for use before reuse means optimising for fast development and retirement of customised components that won’t be reusable
  • Reuse may vary significantly over time depending on your product strategy
  • It’s possible to introduce new reusable components at any time, but their impact might not be very noticeable with a product strategy that drives many customised components

Next Time

I plan to look further into the relationship between parts and sets, and how this has evolved over time.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

Why the Australian COVIDSafe App Failed

It was a pleasure to collaborate with some of my colleagues on this article in The Australian newspaper, which I was able to put my name to. The article is titled Why the COVIDSafe App Failed, and may need a subscription to access.

The tl;dr

The COVIDSafe experience has been an education and if one thing is clear, it is that if we are going to pin all our hopes on a piece of public health technology, it must be built on sound health evidence and a solid platform of trust for it to have any real value in protecting the communities it serves.

LEGO and Software – Lifespans

This is the third post in a series exploring LEGO as a Metaphor for Software Reuse through data (part 1 & part 2).

In this post, we’ll look at reuse through the lens of LEGO® part lifespans. Not how long before the bricks wear out, are chewed by your dog, or squashed painfully underfoot in the dark, but for what period each part is included in sets for sale.

This is a minor diversion from looking further into the reduction in sharing and reuse themes from part 2, but lifespan is a further distinct concept related to reuse, worthy I think of its own post. All the analysis from this post, which uses the the Rebrickable API, is available at https://github.com/safetydave/reuse-metaphor.

Ages in a Sample Set

To understand sets and parts in the Rebrickable API, I first raided our collection of LEGO build instructions for a suitable set to examine, and I came up with 60012-1, LEGO City Coast Guard 4×4.

Picture of LEGO build instructions for Coast Guard set
The sample set

In the process I discovered parts data contained year_from and year_to attributes, which would enable me to chart the ages of each part in the set when it was released, as a means of understanding reuse.

Histogram of ages of parts in set 60012-1, with 12 parts of <1 year age, gradually decreasing to 7 parts 50-55 years of age

In line with the exponential increase of new parts we’ve already seen, the most common age bracket is 0-5 years, but a number of parts in this set from 2013 were 50-55 years old when it was released! Let’s see some examples of new and old parts…

Image of a new flipper (0-1 yrs age) and sloping brick (55 years age)

The new flipper is pretty cool, but is it even worthy to be moulded from the same ABS plastic as Slope 45° 2 x 2? The sloping brick surely deserves a place in the LEGO Reuse Hall of Fame. It has depth and range – from computer screen in moon base, to nose cone of open wheel racer, to staid roof tile, to minifigure recliner in remote lair. Contemplating this part took me back to my childhood, all those marvellous myriad uses.

And yet I also recalled the slightly unsatisfactory stepped profile that resulted from trying to build a smooth inclined surface with a stack of these parts. As such, this part captures the essence and the paradox of reuse in LEGO products; a single part can do a lot, but it can’t do everything. Let’s look at lifespan of parts more broadly.

Lifespans Across All Parts

The distribution of lifespan across LEGO parts is very uneven.

Distribution of lifespans of LEGO parts - histogram

The vast majority of parts are in use less than 1 year, and only a small fraction are used for more than 10 years. Note I calculate lifespan = year_to + 1 - year_from, and this is using strict parts data, rather than also including minifigures.

Distributions of lifespans of LEGE parts, pie chart with ranges

Exponential Decay

This distribution looks like exponential decay. To understand more clearly, it’s back to the logarithmic domain, where we can fit approximations in two regimes; for the first five years (>80% of parts) and then for the remaining life.

LEGO part lifespans, counts plotted on vertical log scale

The half-life of parts in the first 5 years is about 1 year over that period. So each of the first five years, only about half the parts live to the next year. However, the first year remains an outlier, as only 34% of parts make it to their second year and beyond. After 5 years, just under 1,000 survive compared to the 25,000 at 1 year, and from that point the half-life extends to about 7 years. So, while some parts like Slope 45° 2 x 2 live long lives, the vast majority are winnowed out much earlier.

Churn

We can also look at the count of parts released (year_from) and the count of parts retired (year_to + 1) each year.

LEGO parts released and retired each year - line chart

As expected, parts released shows exponential growth, but parts retired also grows, and almost in synchrony, so the net change is small compared to the total parts in and out. By summing up the difference each year, we can chart the number of active parts over time.

Active LEGO parts by year and change by year - line and column plot

Active parts are a small proportion of all parts released to date; they represent about one seventh of all parts released, approximately 5,500 of 36,500. Comparing total changes to the active set size each year also shows a high and increasing rate of churn.

LEGO part churn by year - line chart

So even as venerable stalwarts such as Slope 45° 2 x 2 persist, in recent years about 80% of active LEGO parts have churned each year! Interestingly, the early 1980s to late 1990s was a period of much lower churn. Note also churn percentages prior to 1970 were high and varied widely (not shown before 1965 for clarity), probably reflective of a much smaller base of parts and maybe artefacts with older data.

Lifespans vs Year Release and Retired

We’ve got a lot of insight from just year_from and year_to. One last way to look at lifespans is how they have changed over time, such as lifespan vs year released or retired.

LEGO part lifespan scatter plots

Obvious in these charts is that we only have partial data on the lifespan of active parts (we don’t know how much longer until they’ll be retired), but as above, they are a small proportion. We can discern a little more by using a box plot.

LEGO part lifespan box plots

The plot shows, for each year, median lifespans (orange), the middle range (box), the typical range (whiskers) and lifespan outliers (smoky grey). We see here again that the 1980s and 1990s were a good period in relative terms for releasing long-lived parts that have only just been retired. However, with the huge volume of more short-lived parts being retired in recent years, we don’t see their impact in the late 2010s on the retired plot, except as outliers. In general, the retired (left) plot, like the later years of the released (right) plot shows lower lifespan distributions because the long-lived parts are overwhelmed by ever-increasing numbers of contemporaneous short-lived parts.

Lessons for Software Reuse

If LEGO products are to be a metaphor and baseline for reuse in software products, this analysis of part lifespans is consistent with the observations from part 1, while further highlighting:

  • Components that are heavily reused may be a minority of all components, and in an environment of frequent and increasing product releases, many components may have very short lifetimes, driven by acute needs.
  • There may be a “great filter” for reuse, such as the one year or five year lifespan for LEGO parts. This may also be interpreted as “use before reuse”, or that components must demonstrate exceptional performance in core functions, or support stable market demands, before wider reuse is viable.
  • Our impressions and expectations for reuse of software components may be anchored to particular time periods. We see that the 1980s and 1990s (when there were only ~10% of the LEGO parts released to 2020) were a period of much lower churn and the release of relatively more parts with longer lifespans. The same may be true for periods of software development in an organisation’s history.
  • Retirement of old components can be synchronised with introduction of new components, and in fact, this is probably essential to effectively manage complexity and to derive benefits of reuse without the burden of legacy.

Further Analysis

We’ll come back to the reduction in sharing and reuse theme, and find a lot more interesting ways to look at the Rebrickable data in future posts.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

LEGO and Software – Variety and Specialisation

Since my first post on LEGO as a Metaphor for Software Reuse, I have done some more homework on existing analyses of LEGO® products, to understand what I could myself reuse and what gaps I could fill with further data analysis.

I’ve found three fascinating analyses that I share below. However, I should note that these analyses weren’t performed considering LEGO products as a metaphor or benchmark for software reuse. So I’ve continued to ask myself: what lessons can we take away for better management of software delivery? For this post, the key takeaways are market and product drivers of variety, specialisation and complexity, rather than strategies for reuse as such. I’m hoping to share more insight on reuse in LEGO in future posts, in the context of these market and product drivers.

I also discovered the Rebrickable downloads and API, which I plan to use for any further analysis – I do hope I need to play with more data!

Reuse Concepts

I started all this thinking about software reuse, which is not an aim in itself, but a consideration and sometimes an outcome in efficiently satisfying software product objectives. As we think about reuse and consider existing analyses, I found it helpful to define a few related concepts:

  • Variety – the number of different forms or properties an entity under consideration might take. We might talk about variety of themes, sets, parts, and colours, etc.
  • Specialisation – of parts in particular, where parts serve only limited purposes.
  • Complexity – the combinations or interactions of entities, generally increasing with increasing variety and specialisation.
  • Sharing – of parts between sets in particular, where parts appear in multiple sets. We might infer specialisation from limited sharing.
  • Reuse – sharing, with further consideration of time, as some reuse scenarios may be identified when a part is introduced, some may emerge over time, and some opportunities for future reuse may not be realised.

Considering these concepts, the first two analyses focus mainly on understanding variety and specialisation, while the third dives deeper into sharing and reuse.

Increase in Variety and Specialisation

The Colorful Lego

Visualisation of colours in use in LEGO sets over time
Analysis of LEGO colours in use over time. Source: The Colorful Lego Project

Great visualisations and analysis in this report and public dashboard from Edra Stafaj, Hyerim Hwang and Yiren Wang, driven primarily by the evolving colours found in LEGO sets of over time, and considering colour as a proxy for complexity. Some of the key findings:

  • The variety of colours has increased dramatically over time, with many recently introduced colours already discontinued.
  • The increase in variety of colours is connected with growth of new themes. Since 2010, there has been a marked increase in co-branded sets (“cooperative” theme, eg, Star Wars) and new in-house branded sets (“LEGO commercial” theme, eg, Ninjago) as a proportion of all sets.
  • That specialised pieces (as modelled by Minifig Heads – also noted as the differentiating part between themes) make up the bulk of new pieces, compared to new generic pieces (as modelled by Bricks & Plates).

Colour is an interesting dimension to consider, as it may be argued an aesthetic, rather than mechanical, consideration for reuse. However, as noted in the diversification of themes, creating and satisfying a wider array of customer segments is connected to the increasing variety of colour.

So I see variety and complexity increasing, and more specialisation over time. The discontinuation of colours suggests reuse may be reducing over time, even while generic bricks & plates persist.

67 Years of Lego Sets

Visualisation of the LEGO, in the LEGO, for the LEGO people. Source 67 Years of Lego Sets

An engaging summary from Joel Carron of the evolution of LEGO sets over the years, including Python notebook code, and complete with a final visualisation made of LEGO bricks! Some highlights:

  • The number of parts in a set has in general increased over time.
  • The smaller sets have remained a similar size over time, but the bigger sets keep getting bigger.
  • As above, colours are diversifying, with minor colours accounting for more pieces, and themes developing distinct colour palettes.
  • Parts and sets can be mapped in a graph or network showing the degree to which parts are shared between sets in different themes. This shows some themes share a lot of parts with other themes, while some themes have a greater proportion of unique parts. Generally, smaller themes (with fewer total parts) share more than larger themes (with more total parts).

So here we add to variety and specialisation with learning about sharing too, but without the chronological view of that would help us understand more about reuse – were sets with high degrees of part sharing developed concurrently or sequentially?

Reduction in Sharing and Reuse

LEGO products have become more complex

A comprehensive paper, with dataset and R scripts, analysing increasing complexity in LEGO products, with a range of other interesting-looking references to follow up on, though acknowledgement that scientific investigations on the development of the LEGO products remain scarce.

This needs a thorough review in its own post, with further analysis and commentary on the implications for software reuse and management. That will be the third post of this trilogy in N parts.

Lessons for Software Reuse

If we are considering LEGO products as a metaphor and benchmark for software reuse, we should consider the following.

Varied market needs drive variety and specialisation of products, which in turn can be expected to drive variety and specialisation of software components. Reuse of components here may be counter-productive from a product-market fit perspective (alone, without further technical considerations). However, endless customisation is also problematic and a well-designed product portfolio will allow efficient service of the market.

Premium products may also be more complex, with more specialised components. Simpler products, with lesser performance requirements, may share more components. The introduction of more premium products over time may be a major driver of increased variety and specialisation.

These market and product drivers provide context for reuse of software components.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

No Smooth Path to Good Design

The path to good design is bumpy, as we will demonstrate with four teapots. (Yes, teapots. Teapots are a staple of computer science and philosophy.)

The path to good design matters, because if you are trying to build a design capability, the journey will be smoother if you understand that the path is bumpy.

Leaders who appreciate the bumpy path can facilitate far greater value creation and support a more engaged group of workers.

What is design?

Design is an activity, but also a result: the specification for a product (service), which determines how it is made or delivered.

Performance is a measure of how a product actually functions, for a given task in a given context. Performance in the broadest sense includes emotional responses, static and dynamic physical characteristics, service characteristics, etc. For simplicity, let’s measure performance in monetary terms; eg. lifetime economic value.

Design is important as an activity and a result, because it is the prime determinant of performance that is within your control.

The smooth path

Teapot by Norman
Teapot by Norman[1]
Consider the distinctive teapot from the cover of Don Norman’s Design of Everyday Things, where the handle – instead of opposing – is aligned with the spout.

We know a thing or two about teapots, so we assume this design has very poor performance!

However, we also assume that a traditional design with handle opposed to the spout produces the best performance.

We can plot our smooth model of how performance varies as a function of the angle between spout and handle.

Performance of teapot design variants
Performance of teapot design variants

And it’s pretty clear how to find the best design. The more opposing the handle and spout, the better the performance, the more value created, and hence the better the design.

The first bump in the path

Yokode Kyusu
Yokode Kyusu [1]
However, this model is broken. We can’t interpolate smoothly (linearly) between design points, as demonstrated by the Japanese yokode kyusu, which features a handle at right angles to its spout, to extract every last drop of tea.

With this new insight, and a further assumption that handles in between the points we’ve plotted (eg, 45 degrees) are much worse due to awkward twisting motions when pouring, we can draw a new model, which is already much less smooth.

Teapot performance with new information
Teapot performance with new information

What’s interesting about this landscape is that most design variants perform pretty poorly, and you must be close to a good design to find it. If you didn’t have the insight into teapot performance that we have assumed – if you had only tested performance at the awkward angles, and you had assumed smooth behaviour in between – you would likely miss the best designs and leave significant value on the table. (Note that the scale of this diagram should be greatly exaggerated to demonstrate the true size of value creation opportunities.)

Value created by discovery
Value created by exploration

So, this is the first lesson of the bumpy path to good design. We need to explore the performance of multiple design variants, and understand that small changes in design can have enormous impacts on performance, to be confident we are approaching our potential to create value.

Teapot with handle on top
Teapot with handle on top [3]
So far, we have only explored the impact of one design variable, but for any product we have effectively infinitely many design variables (if we can just conceive them). For instance, the handle of a teapot could also be on top, but we could also consider the shape, material, fixtures, etc. Then we could move beyond the handle to the design of the rest of the teapot!

Now consider the design and delivery of digital products and services. Constraints do exist, but infinite design variants still exist within those constraints. Further, like the rolled up dimensions of string theory, there are extra dimensions of design that are easy to miss, but once discovered can be expanded and explored to create ever more value.

The first lesson

How do leaders get this wrong? By failing to encourage the exploration of a sufficient number of design variants, and by failing to encourage the exploration of minor changes that have outsize impact.

As a leader, you must be prepared to carve out time and space, embrace uncertainty and ambiguity, and bring creativity, compassion and patience to the exploration process. As important as this is to creating value, it is also key to maintaining the engagement of teams involved in or interacting with design.

I’m often told that exploration feels inefficient. Or, rather, felt inefficient. The distinction is importation. Hindsight bias distorts the reality that before starting an exploration into a sufficiently bumpy landscape, we simply cannot know what we will find. So how do we measure efficiency of exploration? Certainly not by how quickly we arrive at a design, or by how many designs are discarded. Should we even measure efficiency of exploration? That is a better question. We should focus on net value creation, and do enough exploration to mitigate the risk that we are leaving significant value on the table.

This design sensibility, however, may not be apparent to the whole team. Designers will be frustrated being managed to a smooth path, while others who perceive the challenge to be simple may become frustrated when the bumpiness is allowed to surface. The team’s various activities may have different cadences that sometimes align, and sometimes don’t. This can create friction and dissatisfaction in teams. Some functional conflict is healthy in this regard, but as a leader, you must support and enable a team to focus on what it takes to create value.

The second bump in the path

I have used word “assume” liberally and deliberately above. I have assumed a large number of things about the tasks that users of the teapots are seeking to achieve, and the broader contexts of use. I have further assumed that my readers share a traditional western notion of teapots and their use. I have done this to keep simple – I hope – the explanation of the first bump.

But “assume” is at the root of the second bump. During product development, we can’t assume performance, we must test designs with users engaged a task in a context. We may take shortcuts by prototyping, simulating, etc, but we must test as objectively as possible, for a meaningful prediction of a product’s performance, and potential to create value.

In a bumpy design landscape, poor predictions of actual performance carry significant opportunity cost.

Value created by testing
Value created by testing

(Note also that during the development of a typical digital product/service, we are typically iteratively discovering the task and the context in parallel.)

We assumed, with our teapots above, that a spout aligned with the handle would lead to poor performance, but we didn’t test it (with a minor tweak in a hidden dimension). If we’d tested this traditional oriental design (as UX Designer Mike Eng did), we would have discovered that, for the task of serving oneself, in a solitary context, the aligned handle actually produces superior performance.

Aligned handle teapot
Aligned handle teapot [4]
I was surprised to find this teapot design existed when I stumbled upon the post from above. I suspect this teapot design has a specific name or an interesting story behind it, but I haven’t been able to track it down. However, it serves as an excellent demonstration that the best design paths are bumpy.

The second lesson

The second lesson is that assumptions about performance, task and context hide the inherent bumpiness in design. As a leader, you must recognise and challenge assumptions, encourage the testing of designs under the correct conditions, and appreciate that our understanding of task and context may evolve with testing.

There are many resources that discuss lightweight and effective approaches to UX research and testing; you could do worse than to start here.

Conclusion

We have discussed two major value creation activities in design:

  • Exploration and consequent discovery of performant designs
  • Testing and consequent selection of more performant designs

But these activities are overlooked or de-prioritised with a smooth mindset. While there is uncertainty, ambiguity and friction along the path, and sometimes progress is difficult to discern, as a leader, you must embrace the bumps because – if you are in the business of creating value – there is no smooth path to good design.

Image credits
  1. http://www.amazon.co.uk/Design-Everyday-Things-Donald-Norman/dp/0262640376
  2. http://commons.wikimedia.org/wiki/File:JapaneseTeapot.jpg
  3. http://www.wilkinpottery.com/product/teapot-top-handle/
  4. Ebay listing from seller http://www.ebay.com/usr/mitch8670

Jetty to Jetty app

I released an app 🙂 – for iOS and Android.

It’s a self-guided audio tour of historic sites in Broome, Western Australia, including beautiful stories told by locals. Nyamba Buru Yawuru developed the concept, curated the media, engaged local stakeholders, and were product owners for the app.

Jetty to Jetty screenshots
Jetty to Jetty screenshots

This work was exciting for its value to the Broome and Yawuru community, but also because it was an opportunity to innovate under the constraint of building the simplest thing possible. The simplest thing possible was in stark contrast to the technical whizbangery (though lean delivery) of my previous app project – Fireballs in the Sky.

I had fun working on the interaction and visual design challenges under the constraints, and I think the key successes were:

  • Simplifying presentation of the real-world and in-app navigation as a hand-rolled map (drawn in Inkscape), showing all the sites, that scrolls in a single direction.
  • Hiding everything unnecessary during playback of stories, to allow the user to focus on the place and the story.
  • Playback control behaviour across sites and the main map.
  • Not succumbing to the temptation to add geo-location, background audio, or anything else that could have added to the complexity!

My colleague Nathan Jones laid the technical foundations – Phonegap/Cordova wrapping a static site built by Middleman and using CoffeeScript, knockout.js, HAML, Sass and HTML5/Cordova plugin for media. He later went on to extend and open-source (as Jila) this framework for the Yawuru Ngan-ga language app. Most of the development work by Nathan and me was done in early 2014.

While intended to be used in Broome (and yet another reason to visit Broome), the app and its beautiful stories can be enjoyed anywhere.

Health Hack Perth 2015

HealthHack is a three-day event bringing medical researchers and health practitioners together with software creators to prototype a new generation of health products.

Business News Western Australia covered the Perth 2015 event in: HealthHack – ailments, remedies in equal doses.

I helped organise this event with assistance from sponsors ThoughtWorks and Curtin University (among numerous other generous sponsors). It was a great event, with important and challenging problems presented, innovative solution concepts delivered, and new relationships formed between individuals and organisations in health and technology.

Health Hack summary
Health Hack summary

Please refer to the report and the catalogue of products for detailed information on this event, and resources for hackathons in general. Health Hack is an Open Knowledge Foundation Australia event, so is predicated on sharing open source deliverables.

Some Highlights and Lessons Learned

We focussed on curated problems for this event, approaching a large number of potential “problem owners” with a checklist to recruit those with the most appropriate challenges for the weekend hackathon format. We then worked with the problem owners to shape their challenges and pitches for the “ideas market”. This was a very substantial effort (primarily by the fabulous Diana Adorno) in the lead-up to the weekend, but the well-formed problems were key to the success of the hack.

Health Hack pitch posters
Health Hack pitch posters

We attracted a diverse set of participants, with skills ranging from design, to software development, to data science, and these individuals organised themselves into teams around the problems most suited to their collective skill set. As organisers, we made only one substitution to balance teams.

We started with fewer participants than expected, because the drop-off rate from registrations was substantially higher (50%) than previous years at other sites (30%). However, attrition over the weekend was virtually zero, as the participants were uniformly enthusiastic and energised by their challenges.

The ideas market built great energy around the challenges and the potential for the weekend. We posted the challenges around the room prior to the event. Then the problems owners took turns to pitch in just 2 minutes each from their challenge posters. The pitches were clear and concise, and the cumulative effect was really energising. When the pitches were done, participants had time to walk the room, seek more information from problem owners, and organise their own teams.

Coaching and regular check-ins on team progress helped keep the teams focussed on solving key problems and having a demonstrable product at the end of the weekend. No team failed to showcase. However, we had feedback that access to more coaching would have been valuable.

Health Hack showcase
Health Hack showcase

The venue at Curtin University Chemistry Precinct was ideal, with team tables, breakout spaces and bean bags, and surrounded by gardens. However, it was the only Health Hack venue not in the CBD of the host city, and this may have presented transport challenges (though we didn’t collect any data on this). The plan at the time was to rotate the venue through various supporting institutions in future years.

Food trucks and coffee vans were a great way to service participants! Although it required some coordination ahead of the event, and may not be possible in CBD sites, it was very easy on the weekend, and lots of fun.

For more, see the full report.

Your Software is a Nightclub

Why a nightclub? Well, it’s a better model than a home loan. I’m talking here about technical debt, the concept that describes how retarding complexity (cost) builds up in software development and other activities, and how to manage this cost. A home loan is misleading because product development cost doesn’t spiral out of control due to missed interest payments over time. Costs blow out due to previously deferred or unanticipated socialisation costs being realised with a given change.

So what are socialisation costs? They are the costs incurred when you introduce a new element to an existing group: a new person to a nightclub, or a new feature into a product. Note that we can consider socialisation at multiple levels of the product – UX design, information architecture, etc – not just source code.

Why is socialisation so costly? Because in general you have to socialise each new element with all existing elements, and so you can expect each new element you add to cost more than the last. If you keep adding elements, and even if each pair socialises very cheaply, eventually socialisation cost dominates marginal cost and total cost.

What is the implication of poor socialisation? In a nightclub, this may be a fight, and consequent loss of business. In software, this may be delayed releases or operational issues or poor user experience, and consequent lack of business. If you build airplanes, it could cost billions of dollars.

What does this mean for software delivery, or brand management, or product management, or organisational change, or hiring people, or nightclub management, or any activity where there is continued pressure to add new elements, but accelerating cost of socialisation?

Well, consider that production (of stuff) achieves efficiencies of scale by shifting variable cost to fixed for a certain volume. But software delivery is not production, it is design, and continuous re-design in response to change in our understanding of business outcomes.

Change can be scaled by shifting socialisation costs to variable; we take a variable cost hit with each new element to reduce the likelihood we will pay a high price to socialise future elements. Then we can change and change again in a sustainable manner. We can also segment elements to ensure pairwise cost is zero between segments (architecture). But, ideally, we continue to jettison elements that aren’t adding sufficient value – this is the surest way minimise marginal socialisation cost and preserve business agility. We can deliver a continuous MVP.

So what does this add to the technical debt discussion? All models are wrong; some are useful. Technical debt is definitely useful, and reaches some of the same management conclusions as above.

For me, the nightclub model is a better holistic model for product management, not just for coding. It is more dynamic and reflective of a messy reality. Further, with an economic model of marginal cost, we can assess whether the economics of marginal value stack up. Who do we want in out nightclub? How do we ensure the mix is good for business? Who needs to leave?

What do you think?

Postscript: The Economic Model

We write total cost (C) as the sum of fixed costs (f), constant variable cost per-unit (v) and a factor representing socialisation cost per pair (s):

\[ C = f + vN + sN^2\]

Then marginal cost (M) may be written as:

\[ M = v + 2sN \]

Socialisation Cost
Socialisation cost against fixed and variable costs

Note: This post was originally published August 2014, and rebooted April 2015