LEGO and Software – Part Reuse

This is the fourth post in a series exploring LEGO® as a Metaphor for Software Reuse. The story is evolving as I go because I keep finding interesting things in the data. I’ll tie it all up with the key things I’ve found at some point.

In this post we’re looking from the part perspective – the reusable component – at how many sets or products it’s [re]used in, and how this has changed over time. All the analysis from this post is available at https://github.com/safetydave/reuse-metaphor.

Inventory Data

The parts inventory data from the Rebrickable data sets, image content & API gives us which parts appear in which sets, in which quantities and colours. For instance, if we look at the minifigure flipper from part 3, we can chart its inclusion in sets as below.

The parts inventory data includes minifigures. I may later account for the effect of including or excluding minifgures in these various analyses. We can also tell if a part in a set is a spare; according to the data, 2% of LEGO parts in sets are spares.

Parts Included in Sets

The most reused parts are included in thousands of sets. Here is a gallery of the top 25 parts over all time – from number 1 Plate 1 x 2 (which is included in over 6,200 sets), to our favourite from last time, the venerable Slope 45° 2 x 2 (over 2,700 sets).

These parts appear in a significant fraction of sets, but we can already see a 50% reduction in the number of sets in just the top 25 used parts. Beyond this small sample, we can plot the number of sets that include a given part (call it part set count), as below.

This time we have used log scales on both axes, due to the extremely uneven distribution of part set counts. In contrast to the thousands of sets including top parts, a massive 60% of parts are included in only one set and only 10% of parts are included in more than ten sets. The piecewise straight line fit indicates a power law applies, for instance approximately count = 10000 / sqrt(rank) for the top 100 ranked parts.

Uneven distributions are often expressed in terms of the Pareto principle or 80/20 rule. If we define reuse instances as every time a part is included in a set, after the first set, then we can plot the contribution of each part to total reuse instances and see if this is more or less uneven that the 80/20 rule.

This shows us that reuse of LEGO parts in sets is much more uneven than the 80/20 rule. While the 80/20 rule says 80% of reuse would be due to 20% of parts, in fact we find by this definition that 80% of reuse is due to only 3% of parts, and 20% of parts account for 98% of reuse!

We find a similar phenomenon if we consider the quantities of parts included in sets, rather than just the count of sets per part. We could repeat the whole analysis based on quantities (and the notebook has some options for doing this), but I was fairly satisfied the results would be similar given the degree of correlation we find, below.

I was intrigued, though, by the part that appeared many times in a single set, then never again (the uppermost point of the lower left column). It turns out it is a Window 1 x 2 x 1 (old type) with Extended Lip included in a windows set from 1966 that probably looked a bit like this one.

This brings us neatly to the “long tail” to round out this view of reuse as parts included in multiple sets. As per the distribution above, the tail (of parts that belong to only one set) is really long and full of curiosities. The tail is over 22,000 parts long, though these belong to only about 10,000 unique sets. The parts belong about 60/40 to sets proper vs minifigure packs. Here’s a tail selection – you’ll see they are fairly specialised items like stickers, highly custom parts, minifigure items with unique designs, and even trading cards!

There’s even, perhaps my nemesis, a minifigure set called “Chainsaw Dave”!

Parts Included in Sets Over Time

Part reuse might vary with time, and might be dependent on time. In previous posts we’ve seen an exponential increase in new parts and sets over time and an exponential decay in part lifetimes. We can plot part reuse (set count) against lifespan, as below.

This shows some correlation – which we might expect as longievity is due to reuse and vice versa – but I was also intrigued by the many relatively short-lived parts with high set counts (in the mid-top left region). Colouring points by the year released shows that these are relatively recent parts (at the yellow end of the spectrum). This shows that, as well as long-lived parts, more recent parts also appear in many sets, which is good news for reuse.

However, it’s hard to determine the distribution of set count from the scatter plot, and hence how significant the reuse is. We can see the distribution better with a violin plot, which shows the overall range (‘T’ ends), the distribution of common values (shading), and the median (cross-bar), much like the box plot.

We see that although many parts released in the last few decades are reused in 100s or even 1000s of sets, the median or typical part appears in only a handful of sets. With 100s to 1000s of sets released each year recently, the sets are reusing both old and new parts, but the vast majority of parts are not significantly reused.

Top Parts for Reuse Over Time

Above, we introduced the top 25 parts by set count, based on all time set count, but how has this varied over time? We can chart – for each of the top 25 parts – how many sets they appeared in each year. However, as the number of sets released each year has increased exponentially, we see a clearer pattern if we chart the proportion of sets released each year including the top 25 parts, as below.

This shows variation in proportional set representation for the top parts from about 10% to 50% over the last 40 years. However, there was a very noticeable drop around the year 2000 to a maximum of only 20% of sets including top reused parts. Interestingly, this corresponds to a documented period of financial difficulty for The LEGO Group, but further research would be required to demonstrate any relationship. Prior to 1980, the maximum proportional representation was even higher, indicating a major intention of sets was to provide reusable parts.

From 2000 onwards, the all time ranking becomes more apparent, as the lines generally spread to match the rankings. We can also see this resolution of ranks in a bump chart for the top 4 parts over the time period from just before 2000 to the present.

Lessons for Software

As in previous posts, here’s where I speculate on what this data-directed investigation of LEGO parts and sets over the years might mean for software development. This is only on the presumption that – as often cited informally in my experience – LEGO products are a good metaphor for software. I take this as a given for this series, and as an excuse to play with some LEGO data, but I don’t really test it.

Given the reuse of LEGO parts across sets and time, we might expect that for software:

  • Most reuse will likely come from a small number of components, and this may be far more extreme than the 80/20 heuristic
  • If this is the case, then teams should build for use before reuse, or design very simple foundational components (see the top 25) for widespread reuse
  • Building for use before reuse means optimising for fast development and retirement of customised components that won’t be reusable
  • Reuse may vary significantly over time depending on your product strategy
  • It’s possible to introduce new reusable components at any time, but their impact might not be very noticeable with a product strategy that drives many customised components

Next Time

I plan to look further into the relationship between parts and sets, and how this has evolved over time.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

LEGO and Software – Lifespans

This is the third post in a series exploring LEGO as a Metaphor for Software Reuse through data (part 1 & part 2).

In this post, we’ll look at reuse through the lens of LEGO® part lifespans. Not how long before the bricks wear out, are chewed by your dog, or squashed painfully underfoot in the dark, but for what period each part is included in sets for sale.

This is a minor diversion from looking further into the reduction in sharing and reuse themes from part 2, but lifespan is a further distinct concept related to reuse, worthy I think of its own post. All the analysis from this post, which uses the the Rebrickable API, is available at https://github.com/safetydave/reuse-metaphor.

Ages in a Sample Set

To understand sets and parts in the Rebrickable API, I first raided our collection of LEGO build instructions for a suitable set to examine, and I came up with 60012-1, LEGO City Coast Guard 4×4.

Picture of LEGO build instructions for Coast Guard set
The sample set

In the process I discovered parts data contained year_from and year_to attributes, which would enable me to chart the ages of each part in the set when it was released, as a means of understanding reuse.

Histogram of ages of parts in set 60012-1, with 12 parts of <1 year age, gradually decreasing to 7 parts 50-55 years of age

In line with the exponential increase of new parts we’ve already seen, the most common age bracket is 0-5 years, but a number of parts in this set from 2013 were 50-55 years old when it was released! Let’s see some examples of new and old parts…

Image of a new flipper (0-1 yrs age) and sloping brick (55 years age)

The new flipper is pretty cool, but is it even worthy to be moulded from the same ABS plastic as Slope 45° 2 x 2? The sloping brick surely deserves a place in the LEGO Reuse Hall of Fame. It has depth and range – from computer screen in moon base, to nose cone of open wheel racer, to staid roof tile, to minifigure recliner in remote lair. Contemplating this part took me back to my childhood, all those marvellous myriad uses.

And yet I also recalled the slightly unsatisfactory stepped profile that resulted from trying to build a smooth inclined surface with a stack of these parts. As such, this part captures the essence and the paradox of reuse in LEGO products; a single part can do a lot, but it can’t do everything. Let’s look at lifespan of parts more broadly.

Lifespans Across All Parts

The distribution of lifespan across LEGO parts is very uneven.

Distribution of lifespans of LEGO parts - histogram

The vast majority of parts are in use less than 1 year, and only a small fraction are used for more than 10 years. Note I calculate lifespan = year_to + 1 - year_from, and this is using strict parts data, rather than also including minifigures.

Distributions of lifespans of LEGE parts, pie chart with ranges

Exponential Decay

This distribution looks like exponential decay. To understand more clearly, it’s back to the logarithmic domain, where we can fit approximations in two regimes; for the first five years (>80% of parts) and then for the remaining life.

LEGO part lifespans, counts plotted on vertical log scale

The half-life of parts in the first 5 years is about 1 year over that period. So each of the first five years, only about half the parts live to the next year. However, the first year remains an outlier, as only 34% of parts make it to their second year and beyond. After 5 years, just under 1,000 survive compared to the 25,000 at 1 year, and from that point the half-life extends to about 7 years. So, while some parts like Slope 45° 2 x 2 live long lives, the vast majority are winnowed out much earlier.

Churn

We can also look at the count of parts released (year_from) and the count of parts retired (year_to + 1) each year.

LEGO parts released and retired each year - line chart

As expected, parts released shows exponential growth, but parts retired also grows, and almost in synchrony, so the net change is small compared to the total parts in and out. By summing up the difference each year, we can chart the number of active parts over time.

Active LEGO parts by year and change by year - line and column plot

Active parts are a small proportion of all parts released to date; they represent about one seventh of all parts released, approximately 5,500 of 36,500. Comparing total changes to the active set size each year also shows a high and increasing rate of churn.

LEGO part churn by year - line chart

So even as venerable stalwarts such as Slope 45° 2 x 2 persist, in recent years about 80% of active LEGO parts have churned each year! Interestingly, the early 1980s to late 1990s was a period of much lower churn. Note also churn percentages prior to 1970 were high and varied widely (not shown before 1965 for clarity), probably reflective of a much smaller base of parts and maybe artefacts with older data.

Lifespans vs Year Release and Retired

We’ve got a lot of insight from just year_from and year_to. One last way to look at lifespans is how they have changed over time, such as lifespan vs year released or retired.

LEGO part lifespan scatter plots

Obvious in these charts is that we only have partial data on the lifespan of active parts (we don’t know how much longer until they’ll be retired), but as above, they are a small proportion. We can discern a little more by using a box plot.

LEGO part lifespan box plots

The plot shows, for each year, median lifespans (orange), the middle range (box), the typical range (whiskers) and lifespan outliers (smoky grey). We see here again that the 1980s and 1990s were a good period in relative terms for releasing long-lived parts that have only just been retired. However, with the huge volume of more short-lived parts being retired in recent years, we don’t see their impact in the late 2010s on the retired plot, except as outliers. In general, the retired (left) plot, like the later years of the released (right) plot shows lower lifespan distributions because the long-lived parts are overwhelmed by ever-increasing numbers of contemporaneous short-lived parts.

Lessons for Software Reuse

If LEGO products are to be a metaphor and baseline for reuse in software products, this analysis of part lifespans is consistent with the observations from part 1, while further highlighting:

  • Components that are heavily reused may be a minority of all components, and in an environment of frequent and increasing product releases, many components may have very short lifetimes, driven by acute needs.
  • There may be a “great filter” for reuse, such as the one year or five year lifespan for LEGO parts. This may also be interpreted as “use before reuse”, or that components must demonstrate exceptional performance in core functions, or support stable market demands, before wider reuse is viable.
  • Our impressions and expectations for reuse of software components may be anchored to particular time periods. We see that the 1980s and 1990s (when there were only ~10% of the LEGO parts released to 2020) were a period of much lower churn and the release of relatively more parts with longer lifespans. The same may be true for periods of software development in an organisation’s history.
  • Retirement of old components can be synchronised with introduction of new components, and in fact, this is probably essential to effectively manage complexity and to derive benefits of reuse without the burden of legacy.

Further Analysis

We’ll come back to the reduction in sharing and reuse theme, and find a lot more interesting ways to look at the Rebrickable data in future posts.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

LEGO and Software – Variety and Specialisation

Since my first post on LEGO as a Metaphor for Software Reuse, I have done some more homework on existing analyses of LEGO® products, to understand what I could myself reuse and what gaps I could fill with further data analysis.

I’ve found three fascinating analyses that I share below. However, I should note that these analyses weren’t performed considering LEGO products as a metaphor or benchmark for software reuse. So I’ve continued to ask myself: what lessons can we take away for better management of software delivery? For this post, the key takeaways are market and product drivers of variety, specialisation and complexity, rather than strategies for reuse as such. I’m hoping to share more insight on reuse in LEGO in future posts, in the context of these market and product drivers.

I also discovered the Rebrickable downloads and API, which I plan to use for any further analysis – I do hope I need to play with more data!

Reuse Concepts

I started all this thinking about software reuse, which is not an aim in itself, but a consideration and sometimes an outcome in efficiently satisfying software product objectives. As we think about reuse and consider existing analyses, I found it helpful to define a few related concepts:

  • Variety – the number of different forms or properties an entity under consideration might take. We might talk about variety of themes, sets, parts, and colours, etc.
  • Specialisation – of parts in particular, where parts serve only limited purposes.
  • Complexity – the combinations or interactions of entities, generally increasing with increasing variety and specialisation.
  • Sharing – of parts between sets in particular, where parts appear in multiple sets. We might infer specialisation from limited sharing.
  • Reuse – sharing, with further consideration of time, as some reuse scenarios may be identified when a part is introduced, some may emerge over time, and some opportunities for future reuse may not be realised.

Considering these concepts, the first two analyses focus mainly on understanding variety and specialisation, while the third dives deeper into sharing and reuse.

Increase in Variety and Specialisation

The Colorful Lego

Visualisation of colours in use in LEGO sets over time
Analysis of LEGO colours in use over time. Source: The Colorful Lego Project

Great visualisations and analysis in this report and public dashboard from Edra Stafaj, Hyerim Hwang and Yiren Wang, driven primarily by the evolving colours found in LEGO sets of over time, and considering colour as a proxy for complexity. Some of the key findings:

  • The variety of colours has increased dramatically over time, with many recently introduced colours already discontinued.
  • The increase in variety of colours is connected with growth of new themes. Since 2010, there has been a marked increase in co-branded sets (“cooperative” theme, eg, Star Wars) and new in-house branded sets (“LEGO commercial” theme, eg, Ninjago) as a proportion of all sets.
  • That specialised pieces (as modelled by Minifig Heads – also noted as the differentiating part between themes) make up the bulk of new pieces, compared to new generic pieces (as modelled by Bricks & Plates).

Colour is an interesting dimension to consider, as it may be argued an aesthetic, rather than mechanical, consideration for reuse. However, as noted in the diversification of themes, creating and satisfying a wider array of customer segments is connected to the increasing variety of colour.

So I see variety and complexity increasing, and more specialisation over time. The discontinuation of colours suggests reuse may be reducing over time, even while generic bricks & plates persist.

67 Years of Lego Sets

Visualisation of the LEGO, in the LEGO, for the LEGO people. Source 67 Years of Lego Sets

An engaging summary from Joel Carron of the evolution of LEGO sets over the years, including Python notebook code, and complete with a final visualisation made of LEGO bricks! Some highlights:

  • The number of parts in a set has in general increased over time.
  • The smaller sets have remained a similar size over time, but the bigger sets keep getting bigger.
  • As above, colours are diversifying, with minor colours accounting for more pieces, and themes developing distinct colour palettes.
  • Parts and sets can be mapped in a graph or network showing the degree to which parts are shared between sets in different themes. This shows some themes share a lot of parts with other themes, while some themes have a greater proportion of unique parts. Generally, smaller themes (with fewer total parts) share more than larger themes (with more total parts).

So here we add to variety and specialisation with learning about sharing too, but without the chronological view of that would help us understand more about reuse – were sets with high degrees of part sharing developed concurrently or sequentially?

Reduction in Sharing and Reuse

LEGO products have become more complex

A comprehensive paper, with dataset and R scripts, analysing increasing complexity in LEGO products, with a range of other interesting-looking references to follow up on, though acknowledgement that scientific investigations on the development of the LEGO products remain scarce.

This needs a thorough review in its own post, with further analysis and commentary on the implications for software reuse and management. That will be the third post of this trilogy in N parts.

Lessons for Software Reuse

If we are considering LEGO products as a metaphor and benchmark for software reuse, we should consider the following.

Varied market needs drive variety and specialisation of products, which in turn can be expected to drive variety and specialisation of software components. Reuse of components here may be counter-productive from a product-market fit perspective (alone, without further technical considerations). However, endless customisation is also problematic and a well-designed product portfolio will allow efficient service of the market.

Premium products may also be more complex, with more specialised components. Simpler products, with lesser performance requirements, may share more components. The introduction of more premium products over time may be a major driver of increased variety and specialisation.

These market and product drivers provide context for reuse of software components.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

LEGO as a Metaphor for Software Reuse – Does the Data Stack Up?

LEGO® products are often cited as a metaphor for software reuse; individual parts being composable in myriad ways.

I think this is a bit simplistic and may miss the point for software, but let’s assume we should aim to make software in components that are as reusable as LEGO parts. With that assumption, what level of reuse do we actually observe in LEGO sets over time?

I’d love to know if anyone else has done this analysis publicly, but a quick search didn’t reveal much. So, here’s a first pass from me. I discuss further analysis below.

The data comes from the awesome catalogue at Bricklink. I start with the basic catalogue view by year.

More New Parts Every Year

My first observation is that there are an ever-increasing number of new parts released every year.

Chart of Lego parts over time, showing 10x increase in parts from late 1980s to early 2020s

This trend in new parts has been exponential until just the last few years. The result is that there are currently 10 times the number of parts as when I was a kid – that’s why the chart is on a log scale! These new parts are reusable too.

Therefore, the existence of reusable parts doesn’t preclude the creation of new reusable parts. Nor should we expect the creation of new parts to reduce over time, even if existing parts are reusable. If LEGO products are our benchmark for software reuse, we should expect to be introducing new components all the time.

More New Parts in New Products

My second observation is that the new parts (by count) in new products have generally increased year by year.

The increase has been most pronounced from about 2000 to 2017, rising from about two to over five new parts per new set on average. The increase is observed because – even though new sets over time are increasing (below) – new parts are increasing faster (as above).

Chart showing new Lego sets over time

Therefore, the existence of an ever-increasing number of reusable parts doesn’t preclude an increase in the count of new parts in new sets. If LEGO products are our benchmark for software reuse, we should expect to continue introducing new components in new products, possibly at an increasing rate.

This is only part of the picture, though. I have been careful to specify count of new parts in new sets only, as that data is easy to come by (from the basic Bricklink catalogue view by year). The average count of new parts in new sets is simply the number of new parts each year divided by the number of new sets each year.

We also need to understand how existing parts are reused in new sets. For instance, is the total part count increasing in new products and are are new parts therefore a smaller component of new products? Which existing parts see the most reuse? We might also want to consider the varied nature and function of new and existing parts in new products, and the interaction between products. These deeper analyses could be the focus of future posts.

Provisional Conclusions

As a LEGO product aficionado, I’ve observed an explosion of new parts and sets since I was a kid, but I was surprised by the size of the increase revealed by the data.

There is more work to do on this data, and LEGO products may be a flawed metaphor for software engineering. We may make stronger claims about software reuse with other lines of reasoning and evidence. However, if LEGO products are to be used as a benchmark, the data dispels some misconceptions, and suggests that for functional software product development:

  • Reusable components don’t eliminate the need for new components
  • The introduction of new components may increase over time, even in the presence of reusable components
  • New products may contain more new components than existing products, even in the presence of reusable components

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

The Lockdown Wheelie Project, Part 3

In Melbourne’s COVID-19 lockdown, I’ve wheelied over 17km. Not all at once, though.

Over three months, I’ve spent 90 minutes with my front wheel raised. I’d like to keep it up, but as lockdown has gradually relaxed, and routines have changed, so have I landed the wheelie project, for now.

Read the full article over on Medium at The Lockdown Wheelie Project, Part 3.

More Sankey for Less Confusion?

Confusion Matrixes are essential for evaluating classifiers, but for some who are new to them, they can cause, well, confusion.

Sankey Diagrams are an alternative way of representing matrix data, and I’ve found some people – who are new to matrix data, like business domain experts who are not experienced data scientists – find them easier to understand. Also, some machine learning researchers find Sankey diagrams useful for analysing data and classifiers.

So, I have posted simple code for visualising classifier evaluation or comparisons as Sankey diagrams. Maybe it will be useful for others, as well as fun for me.

The code combines large portions of Plotly Sankey Diagrams with essence of scikit-learn confusion matrix and a lashings of list comprehension code golf.

The scenarios supported are:

  1. Evaluating a binary classifier against ground truth or as champion-challenger,
  2. Evaluating a multi-class classifier against ground truth or as champion-challenger,
  3. Comparing multiple stages of a decision process, or multiple versions of a binary classifier, for instance over time, or hyper-parameter sweeps, and
  4. Comparing multiple versions of a multi-class classifier.
Example confusion matrixes as Sankey diagrams

See the code on Github.

Maths Whimsy

Time to make for a home for those occasional mathematical coding curios. I’ve kicked off with an analysis, using various Numpy approaches, of the gravity field around a square (or cubic) planet, inspired by a project my children were working on.

If you’ve ever wondered, this is what gravity looks like on the surface of a square planet (20 length units long, arbitrary gravitational units) …

… even though the surface would appear visually flat, it would only feel level in the centre of the face. Near a corner, you would feel like you were standing on a 45 degree slope, and because the surface would be visually flat, it would look like you could slide off the far end of it – weird and cool.

I imagine I’ll add to this over time. The bulk of learning to code for me through high school involved mathematical simulations of all kinds: motion of planets under gravity, double pendulums, Mandelbrot sets, L-systems, 3D projections, etc, etc. All that BASIC (and some C) code lost now, but I’ll keep my eye out for more interesting problems and compile them here.

See also ThoughtWorks “Shokunin” coding problems of a mathematical nature that have piqued my interest over time:

Breadth first search and simulated annealing solvers for a task allocation problem

Scaling Change Spoiler

When software engineers think about scaling, they think in terms of the order of complexity, or “Big-O“, of a process or system. Whereas production is O(N) and can be scaled by shifting variable costs to fixed, I contend that change is O(N2) due to the interaction of each new change with all previous changes. We could visualise this as a triangular matrix heat map of the interaction cost of each pair of changes (where darker shading is higher cost).

Change heatmap
Change interaction heatmap

The thing about change being O(N2) is that the old production management heuristics of shifting variable cost to fixed no longer work, because the dominant mode is interaction cost. Instead we use the following management heuristics:

Socialise

Socialising change
Socialising change

We take a variable cost hit for each change to help it play more nicely with every other change. This reduces the cost coefficient but not the number of interactions (N2).

Screen

Screening change
Screening change

We only take in the most valuable changes. Screening half our changes (N/2) reduces change interactions by three quarters (N2/4).

Seclude

Secluding change
Secluding change

We arrange changes into separate spaces and prevent interaction between spaces. Using n spaces reduces the interactions to N2/n.

Surrender

Surrendering change
Surrendering change

Like screening, but at the other end. We actively manage out changes to reduce interactions. Surrendering half our changes (N/2) reduces change interactions by three quarters (N2/4).

Scenarios

Where do we see these approaches being used? Just some examples:

  • Start-ups screen or surrender changes and hence are more agile than incumbents because they have less history of change.
  • Product managers screen changes in design and seclude changes across a portfolio, for example the separate apps of Facebook/ Messenger/ Instagram/ Hyperlapse/ Layout/ Boomerang/ etc
  • To manage technical debt, good developers socialise via refactoring, better seclude through architecture, and the best surrender
  • In hiring, candidates are screened and socialised through rigorous recruitment and training processes
  • Brand architectures also seclude changes – Unilever’s Dove can campaign for real beauty while Axe/Lynx offends Dove’s targets (and many others).

See Also