LEGO and Software – Variety and Specialisation

Since my first post on LEGO as a Metaphor for Software Reuse, I have done some more homework on existing analyses of LEGO® products, to understand what I could myself reuse and what gaps I could fill with further data analysis.

I’ve found three fascinating analyses that I share below. However, I should note that these analyses weren’t performed considering LEGO products as a metaphor or benchmark for software reuse. So I’ve continued to ask myself: what lessons can we take away for better management of software delivery? For this post, the key takeaways are market and product drivers of variety, specialisation and complexity, rather than strategies for reuse as such. I’m hoping to share more insight on reuse in LEGO in future posts, in the context of these market and product drivers.

I also discovered the Rebrickable downloads and API, which I plan to use for any further analysis – I do hope I need to play with more data!

Reuse Concepts

I started all this thinking about software reuse, which is not an aim in itself, but a consideration and sometimes an outcome in efficiently satisfying software product objectives. As we think about reuse and consider existing analyses, I found it helpful to define a few related concepts:

  • Variety – the number of different forms or properties an entity under consideration might take. We might talk about variety of themes, sets, parts, and colours, etc.
  • Specialisation – of parts in particular, where parts serve only limited purposes.
  • Complexity – the combinations or interactions of entities, generally increasing with increasing variety and specialisation.
  • Sharing – of parts between sets in particular, where parts appear in multiple sets. We might infer specialisation from limited sharing.
  • Reuse – sharing, with further consideration of time, as some reuse scenarios may be identified when a part is introduced, some may emerge over time, and some opportunities for future reuse may not be realised.

Considering these concepts, the first two analyses focus mainly on understanding variety and specialisation, while the third dives deeper into sharing and reuse.

Increase in Variety and Specialisation

The Colorful Lego

Visualisation of colours in use in LEGO sets over time
Analysis of LEGO colours in use over time. Source: The Colorful Lego Project

Great visualisations and analysis in this report and public dashboard from Edra Stafaj, Hyerim Hwang and Yiren Wang, driven primarily by the evolving colours found in LEGO sets of over time, and considering colour as a proxy for complexity. Some of the key findings:

  • The variety of colours has increased dramatically over time, with many recently introduced colours already discontinued.
  • The increase in variety of colours is connected with growth of new themes. Since 2010, there has been a marked increase in co-branded sets (“cooperative” theme, eg, Star Wars) and new in-house branded sets (“LEGO commercial” theme, eg, Ninjago) as a proportion of all sets.
  • That specialised pieces (as modelled by Minifig Heads – also noted as the differentiating part between themes) make up the bulk of new pieces, compared to new generic pieces (as modelled by Bricks & Plates).

Colour is an interesting dimension to consider, as it may be argued an aesthetic, rather than mechanical, consideration for reuse. However, as noted in the diversification of themes, creating and satisfying a wider array of customer segments is connected to the increasing variety of colour.

So I see variety and complexity increasing, and more specialisation over time. The discontinuation of colours suggests reuse may be reducing over time, even while generic bricks & plates persist.

67 Years of Lego Sets

Visualisation of the LEGO, in the LEGO, for the LEGO people. Source 67 Years of Lego Sets

An engaging summary from Joel Carron of the evolution of LEGO sets over the years, including Python notebook code, and complete with a final visualisation made of LEGO bricks! Some highlights:

  • The number of parts in a set has in general increased over time.
  • The smaller sets have remained a similar size over time, but the bigger sets keep getting bigger.
  • As above, colours are diversifying, with minor colours accounting for more pieces, and themes developing distinct colour palettes.
  • Parts and sets can be mapped in a graph or network showing the degree to which parts are shared between sets in different themes. This shows some themes share a lot of parts with other themes, while some themes have a greater proportion of unique parts. Generally, smaller themes (with fewer total parts) share more than larger themes (with more total parts).

So here we add to variety and specialisation with learning about sharing too, but without the chronological view of that would help us understand more about reuse – were sets with high degrees of part sharing developed concurrently or sequentially?

Reduction in Sharing and Reuse

LEGO products have become more complex

A comprehensive paper, with dataset and R scripts, analysing increasing complexity in LEGO products, with a range of other interesting-looking references to follow up on, though acknowledgement that scientific investigations on the development of the LEGO products remain scarce.

This needs a thorough review in its own post, with further analysis and commentary on the implications for software reuse and management. That will be the third post of this trilogy in N parts.

Lessons for Software Reuse

If we are considering LEGO products as a metaphor and benchmark for software reuse, we should consider the following.

Varied market needs drive variety and specialisation of products, which in turn can be expected to drive variety and specialisation of software components. Reuse of components here may be counter-productive from a product-market fit perspective (alone, without further technical considerations). However, endless customisation is also problematic and a well-designed product portfolio will allow efficient service of the market.

Premium products may also be more complex, with more specialised components. Simpler products, with lesser performance requirements, may share more components. The introduction of more premium products over time may be a major driver of increased variety and specialisation.

These market and product drivers provide context for reuse of software components.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

LEGO as a Metaphor for Software Reuse – Does the Data Stack Up?

LEGO® products are often cited as a metaphor for software reuse; individual parts being composable in myriad ways.

I think this is a bit simplistic and may miss the point for software, but let’s assume we should aim to make software in components that are as reusable as LEGO parts. With that assumption, what level of reuse do we actually observe in LEGO sets over time?

I’d love to know if anyone else has done this analysis publicly, but a quick search didn’t reveal much. So, here’s a first pass from me. I discuss further analysis below.

The data comes from the awesome catalogue at Bricklink. I start with the basic catalogue view by year.

More New Parts Every Year

My first observation is that there are an ever-increasing number of new parts released every year.

Chart of Lego parts over time, showing 10x increase in parts from late 1980s to early 2020s

This trend in new parts has been exponential until just the last few years. The result is that there are currently 10 times the number of parts as when I was a kid – that’s why the chart is on a log scale! These new parts are reusable too.

Therefore, the existence of reusable parts doesn’t preclude the creation of new reusable parts. Nor should we expect the creation of new parts to reduce over time, even if existing parts are reusable. If LEGO products are our benchmark for software reuse, we should expect to be introducing new components all the time.

More New Parts in New Products

My second observation is that the new parts (by count) in new products have generally increased year by year.

The increase has been most pronounced from about 2000 to 2017, rising from about two to over five new parts per new set on average. The increase is observed because – even though new sets over time are increasing (below) – new parts are increasing faster (as above).

Chart showing new Lego sets over time

Therefore, the existence of an ever-increasing number of reusable parts doesn’t preclude an increase in the count of new parts in new sets. If LEGO products are our benchmark for software reuse, we should expect to continue introducing new components in new products, possibly at an increasing rate.

This is only part of the picture, though. I have been careful to specify count of new parts in new sets only, as that data is easy to come by (from the basic Bricklink catalogue view by year). The average count of new parts in new sets is simply the number of new parts each year divided by the number of new sets each year.

We also need to understand how existing parts are reused in new sets. For instance, is the total part count increasing in new products and are are new parts therefore a smaller component of new products? Which existing parts see the most reuse? We might also want to consider the varied nature and function of new and existing parts in new products, and the interaction between products. These deeper analyses could be the focus of future posts.

Provisional Conclusions

As a LEGO product aficionado, I’ve observed an explosion of new parts and sets since I was a kid, but I was surprised by the size of the increase revealed by the data.

There is more work to do on this data, and LEGO products may be a flawed metaphor for software engineering. We may make stronger claims about software reuse with other lines of reasoning and evidence. However, if LEGO products are to be used as a benchmark, the data dispels some misconceptions, and suggests that for functional software product development:

  • Reusable components don’t eliminate the need for new components
  • The introduction of new components may increase over time, even in the presence of reusable components
  • New products may contain more new components than existing products, even in the presence of reusable components

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

Data Visualisation Podcast

It was fun to join the ThoughtWorks Tech Podcast again, with Zhamak Dehghani, Alexey Boas, and Ned Letcher, this time to talk about Getting to grips with data visualization.

A vast array of powerful data visualization tools are gaining traction in enterprises looking to make sense of their data sets, for instance D3, Bokeh, Shiny and Dash. In this episode, our team explores to concept of data visualization as part of a complete digital experience, with the workflows and journeys of a wide variety of users.

ThoughtWorks Technology Podcasts

The Lockdown Wheelie Project, Part 3

In Melbourne’s COVID-19 lockdown, I’ve wheelied over 17km. Not all at once, though.

Over three months, I’ve spent 90 minutes with my front wheel raised. I’d like to keep it up, but as lockdown has gradually relaxed, and routines have changed, so have I landed the wheelie project, for now.

Read the full article over on Medium at The Lockdown Wheelie Project, Part 3.

More Sankey for Less Confusion?

Confusion Matrixes are essential for evaluating classifiers, but for some who are new to them, they can cause, well, confusion.

Sankey Diagrams are an alternative way of representing matrix data, and I’ve found some people – who are new to matrix data, like business domain experts who are not experienced data scientists – find them easier to understand. Also, some machine learning researchers find Sankey diagrams useful for analysing data and classifiers.

So, I have posted simple code for visualising classifier evaluation or comparisons as Sankey diagrams. Maybe it will be useful for others, as well as fun for me.

The code combines large portions of Plotly Sankey Diagrams with essence of scikit-learn confusion matrix and a lashings of list comprehension code golf.

The scenarios supported are:

  1. Evaluating a binary classifier against ground truth or as champion-challenger,
  2. Evaluating a multi-class classifier against ground truth or as champion-challenger,
  3. Comparing multiple stages of a decision process, or multiple versions of a binary classifier, for instance over time, or hyper-parameter sweeps, and
  4. Comparing multiple versions of a multi-class classifier.
Example confusion matrixes as Sankey diagrams

See the code on Github.

Melbourne Data Visualisation Meetup – October 2020

I presented at the Melbourne Data Visualisation Meetup along with Ned Letcher, who gave an awesome overview of Python Libraries for Building Data Apps (an analytics superpower).

The topic was Data Visualisation – Good for Business.

Data Visualisation is key for gaining new knowledge, better engaging audiences, and driving meaningful action. We’ll share bespoke data visualisation case studies from our work at ThoughtWorks and examine their business impact. We’ll also discuss the ongoing role of the art and science of data visualisation in a world driven by machine learning.

Maths Whimsy

Time to make for a home for those occasional mathematical coding curios. I’ve kicked off with an analysis, using various Numpy approaches, of the gravity field around a square (or cubic) planet, inspired by a project my children were working on.

If you’ve ever wondered, this is what gravity looks like on the surface of a square planet (20 length units long, arbitrary gravitational units) …

… even though the surface would appear visually flat, it would only feel level in the centre of the face. Near a corner, you would feel like you were standing on a 45 degree slope, and because the surface would be visually flat, it would look like you could slide off the far end of it – weird and cool.

I imagine I’ll add to this over time. The bulk of learning to code for me through high school involved mathematical simulations of all kinds: motion of planets under gravity, double pendulums, Mandelbrot sets, L-systems, 3D projections, etc, etc. All that BASIC (and some C) code lost now, but I’ll keep my eye out for more interesting problems and compile them here.

See also ThoughtWorks “Shokunin” coding problems of a mathematical nature that have piqued my interest over time:

Breadth first search and simulated annealing solvers for a task allocation problem

ML Interpretability with Ambient Visualisations

I produced some ambient visualisations as background to short talks on the topic of Interpreting the Opaque Box of ML from ThoughtWorks Technology Radar Volume 21. The talks were presented in breaks at the YOW Developer Conference.

Animation of linear to non-linear model selection

Here are my speaker notes.

Theme Intro

The theme I’m talking about is Interpreting the Opaque Box of ML.

It’s a theme because the radar has a lot of ML blips – those are the individual tools, techniques, languages and frameworks we track, and they all have an aspect of interpretability.

I’m going to talk first about Explainability as a First Class Model Concern.

Explainability as a First Class Model Concern

ML models make predictions. They take some inputs and predict an output, based on the data they’ve been trained on. Without careful thought, those predictions can be opaque boxes

For example – predicting whether someone should be offered credit. A few people at the booth have mentioned this experience“[the] black box algorithm thinks I deserve 20x the credit limit [my wife] does” – and the difficulty in getting an explanation from the provider [this was a relevant example at the time].

Elevated to a first class concern, however, ML predictions are interpretable and explainable to different degrees – it’s not actually a question of opaque box or transparent box, but many shades of translucency.

Spectrum

Interpretable means people can reason about a model’s decision-making process in general terms while, explainable means people can understand the factors that led to a specific decision. People are important in this definition – a data scientist may be satisfied with the explanation that the model minimises total loss, while a declined credit applicant probably requires and deserves a reason code. 

And those two extremes can anchor our spectrum – at one end we can explain a result as a general consequence of ML, at the other end explaining the specific factors that contributed to an individual decision.

Dimensions – What

As dimensions of explainability , we should consider:

  • The choice of modelling technique as intrinsically explainable
  • Model agnostic explainability techniques
  • Whether global or just local interpretability is required

Considering model selection – a decision tree is intrinsically explainable – factors contribute sequentially to a decision. A generic deep neural network is not. However, in between, we can architect networks to use techniques such as embeddings, latent spaces or transfer learning, which create representations of inputs that are distinct and interpretable to a degree, but not always in human terms.

And so model specific explainability relies on the modelling technique, while model agnostic techniques are instead empirically applicable to any model. We can create surrogate explainable models for any given model, such as a wide network paired with a deep network, and we can use ablation to explore the effect of changing inputs on a model’s decisions.

For a given decision, we might only wish to understand how that decision would have been different had the inputs changed slightly. In this case we are only concerned about local interpretability and explainability, but not the model as a whole, and LIME is an effective technique.

Reasons – Why

As broader business concerns, we should care about explainability because:

  • Knowledge management is crucial for organisations – an interpretable model, such as the Glasgow Coma Scale, may be valued more for people’s ability to use it than its pure predictive performance
  • We must be compliant to local laws, and it is in all stakeholders’s interests that we act ethically
  • And finally, models can always make mistakes, so a challenge process must be considered, especially as vulnerable people are disproportionately subject to automated decision making

And explainability is closely linked to ethics, and hence the rise of ethical bias testing.

Ethical Bias Testing

Powerful, but Concerning

There is rising concern that powerful ML models could cause unintentional harm. For example, a model could be trained to make profitable credit decisions by simply excluding disadvantaged applicants. So we’re seeing a growing interest in ethical bias testing that will help to uncover potentially harmful decisions, and we expect this field to evolve over time.

Measures

There are many statistical measures we can use to detect unfairness in models. These measures compare outcomes for privileged and unprivileged groups under the model. If we find a model is discriminating against an unprivileged group, we can apply various mitigations to reduce the inequality.  

  • Equal Opportunity Difference is the difference in true positive rates between an unprivileged group and a privileged group. A value close to zero is good.
  • The Disparate Impact is the ratio of the selection rate between the two groups.  The selection rate is the number of individuals selected for the positive outcome divided by the total number of individuals in the group. The ideal value for this metric is 1.

These are just two examples of more than 70 different metrics for measuring ethical bias. Choosing what measure or measures to use is an ethical decision itself, and is affected by your goals. For example, there is the choice between optimising for similarity of outcomes across groups or trying to optimise so that similar individuals are treated the same. If individuals from different groups differ in their non-protected attributes, these could be competing goals.

Correction

To correct for ethical bias or unfairness, mitigations can be applied to the data, to the process of generating the model, and to the output of the model.

  • Data can be reweighted to increase fairness, before running the model.
  • While the model is being generated, it can be penalised for ethical bias or unfairness.
  • Or, after the model is generated, it’s output can be post-processed to remove bias. 

As for explainability, the process of removing ethical bias or improving fairness will likely reduce the predictive performance or accuracy of a model, however, we can see that there is a continuum of tradeoffs possible.

What-if Tool

What is What if

I mentioned tooling is being developed to help with explainability and ethical bias testing, and you should familiarise yourself with these tools and the techniques they use. One example is the What if Tool – an interactive visual interface designed to help you dig into a model’s behaviour. It helps data scientists understand more about the predictions their model is making and was launched by the Google PAIR lab.

Features

You can do things like:

  •  Compare models to each other
  •  Visualize feature importance
  •  Arrange datapoints by similarity
  •  Test algorithmic fairness constraints

Risk

But by themselves tools like this won’t give you explainability or fairness, and using them naively won’t remove the risk or minimize the damage done by a misapplied or poorly trained algorithm. They should be used by people who understand the theory and implications of the results. However, they can be powerful tools to help communicate, tell a story, make the specialised analysis more accessible, and hence motivate improved practice and outcomes.

CD4ML

The radar also mentions for the second time CD4ML – using Continuous Delivery practices for delivering ML solutions. CD in general encourages solutions to evolve in small steps, and the same is true for ML solutions. The benefit of this is that we can more accurately identify the reasons for any change in system behaviour if they are the result of small changes in design or data. So we also highlight CD4ML as a technique for addressing explainability and ethical bias