This is the fourth post in a series exploring LEGO**®** as a Metaphor for Software Reuse. The story is evolving as I go because I keep finding interesting things in the data. I’ll tie it all up with the key things I’ve found at some point.

In this post we’re looking from the part perspective – the reusable component – at how many sets or products it’s [re]used in, and how this has changed over time. All the analysis from this post is available at https://github.com/safetydave/reuse-metaphor.

## Inventory Data

The parts inventory data from the Rebrickable data sets, image content & API gives us which parts appear in which sets, in which quantities and colours. For instance, if we look at the minifigure flipper from part 3, we can chart its inclusion in sets as below.

The parts inventory data includes minifigures. I may later account for the effect of including or excluding minifgures in these various analyses. We can also tell if a part in a set is a spare; according to the data, 2% of LEGO parts in sets are spares.

## Parts Included in Sets

The most reused parts are included in *thousands* of sets. Here is a gallery of the top 25 parts over all time – from number 1 *Plate 1 x 2* (which is included in over 6,200 sets), to our favourite from last time, the venerable *Slope 45° 2 x 2* (over 2,700 sets).

These parts appear in a significant fraction of sets, but we can already see a 50% reduction in the number of sets in just the top 25 used parts. Beyond this small sample, we can plot the number of sets that include a given part (call it *part set count*), as below.

This time we have used log scales on both axes, due to the extremely uneven distribution of part set counts. In contrast to the thousands of sets including top parts, a massive 60% of parts are included in only one set and only 10% of parts are included in more than ten sets. The piecewise straight line fit indicates a power law applies, for instance approximately `count = 10000 / sqrt(rank)`

for the top 100 ranked parts.

Uneven distributions are often expressed in terms of the Pareto principle or **80/20 rule**. If we define *reuse instances* as every time a part is included in a set, *after the first set*, then we can plot the contribution of each part to total reuse instances and see if this is more or less uneven that the 80/20 rule.

This shows us that reuse of LEGO parts in sets is much more uneven than the 80/20 rule. While the 80/20 rule says 80% of reuse would be due to 20% of parts, in fact we find by this definition that 80% of reuse is due to only 3% of parts, and 20% of parts account for 98% of reuse!

We find a similar phenomenon if we consider the quantities of parts included in sets, rather than just the count of sets per part. We could repeat the whole analysis based on quantities (and the notebook has some options for doing this), but I was fairly satisfied the results would be similar given the degree of correlation we find, below.

I was intrigued, though, by the part that appeared many times in a single set, then never again (the uppermost point of the lower left column). It turns out it is a *Window 1 x 2 x 1 (old type) with Extended Lip* included in a windows set from 1966 that probably looked a bit like this one.

This brings us neatly to the “long tail” to round out this view of reuse as parts included in multiple sets. As per the distribution above, the tail (of parts that belong to only one set) is really long and full of curiosities. The tail is over 22,000 parts long, though these belong to only about 10,000 unique sets. The parts belong about 60/40 to sets proper vs minifigure packs. Here’s a tail selection – you’ll see they are fairly specialised items like stickers, highly custom parts, minifigure items with unique designs, and even trading cards!

There’s even, perhaps my nemesis, a minifigure set called “Chainsaw Dave”!

## Parts Included in Sets Over Time

Part reuse might vary with time, and might be dependent on time. In previous posts we’ve seen an exponential increase in new parts and sets over time and an exponential decay in part lifetimes. We can plot part reuse (set count) against lifespan, as below.

This shows some correlation – which we might expect as longievity is due to reuse and vice versa – but I was also intrigued by the many relatively short-lived parts with high set counts (in the mid-top left region). Colouring points by the year released shows that these are relatively recent parts (at the yellow end of the spectrum). This shows that, as well as long-lived parts, more recent parts also appear in many sets, which is good news for reuse.

However, it’s hard to determine the distribution of set count from the scatter plot, and hence how significant the reuse is. We can see the distribution better with a violin plot, which shows the overall range (‘T’ ends), the distribution of common values (shading), and the median (cross-bar), much like the box plot.

We see that although many parts released in the last few decades are reused in 100s or even 1000s of sets, the median or typical part appears in only a handful of sets. With 100s to 1000s of sets released each year recently, the sets are reusing both old and new parts, but the vast majority of parts are not significantly reused.

## Top Parts for Reuse Over Time

Above, we introduced the top 25 parts by set count, based on all time set count, but how has this varied over time? We can chart – for each of the top 25 parts – how many sets they appeared in each year. However, as the number of sets released each year has increased exponentially, we see a clearer pattern if we chart the proportion of sets released each year including the top 25 parts, as below.

This shows variation in proportional set representation for the top parts from about 10% to 50% over the last 40 years. However, there was a very noticeable drop around the year 2000 to a maximum of only 20% of sets including top reused parts. Interestingly, this corresponds to a documented period of financial difficulty for The LEGO Group, but further research would be required to demonstrate any relationship. Prior to 1980, the maximum proportional representation was even higher, indicating a major intention of sets was to provide reusable parts.

From 2000 onwards, the all time ranking becomes more apparent, as the lines generally spread to match the rankings. We can also see this resolution of ranks in a **bump chart** for the top 4 parts over the time period from just before 2000 to the present.

## Lessons for Software

As in previous posts, here’s where I speculate on what this data-directed investigation of LEGO parts and sets over the years might mean for software development. This is only on the presumption that – as often cited informally in my experience – LEGO products are a good metaphor for software. I take this as a given for this series, and as an excuse to play with some LEGO data, but I don’t really test it.

Given the reuse of LEGO parts across sets and time, we might expect that for software:

- Most reuse will likely come from a small number of components, and this may be far more extreme than the 80/20 heuristic
- If this is the case, then teams should build for use before reuse, or design very simple foundational components (see the top 25) for widespread reuse
- Building for use before reuse means optimising for fast development and retirement of customised components that won’t be reusable
- Reuse may vary significantly over time depending on your product strategy
- It’s possible to introduce new reusable components at any time, but their impact might not be very noticeable with a product strategy that drives many customised components

## Next Time

I plan to look further into the relationship between parts and sets, and how this has evolved over time.

*LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site*.