Smarter Semantle Solvers

A little smarter, anyway. I didn’t expect to pick this up again, but when I occasionally run the first generation solvers online, I’m often equal parts amused and frustrated by rare words thrown up that delay the solution – from amethystine to zigging.

Animation of online semantle solution
An example solution with fewer than typical rare words guessed

The solvers used the first idea that worked; can we make some tweaks to make them smarter? The code is now migrated to its own new repo after outgrowing its old home.

Measuring smarts

I measure solver performance by running multiple trials of a solver configuration against the simulator for a variety of target words. This gives a picture of how often the solver typically succeeds within a certain number of guesses.

Chart showing cumulative distribution function curves for two solver configurations

Vocabulary

It turns out that the vocabulary to date based on english_words_set is a poor match for the most frequently used English words, according to unigram frequency data.

So we might expect that simply replacing the solver vocabulary would improve performance, and we also get word ranking from unigram_freq.

Semantic models

We’ll continue with Universal Sentence Encoder (USE) to ensure search strategies are robust to different semantic models.

Search

To improve the gradient solver I tried making another random guess every so often to avoid long stretches exploring local minima. But it didn’t make things better, and probably made them worse!

In response, I made each guess the most common local word to the extrapolated semantic location, rather than just the nearest word. Still no better, and trying both “improvements” together was significantly worse!

Ah well, experiments only fail if we fail to learn from them!

Vocabulary again

I think the noise inherent in a different semantic model, plus the existing random extrapolation distance, overwhelms the changes I tried. In better news, we see a major improvement from using unigram freq vocabulary, reducing the mean from 280 (with many searches capped at 500) to 198, approximately a 30% improvement.

Smarter still?

Here we see that the data-centric (vocabulary) improvement had a far bigger impact than any model-centric (search algorithm) improvement that I had the patience to try (though I left a bunch of further todos). Maybe just guessing randomly from the top n words will be better again! ????

At least I’ve made a substantial dent in reducing those all-too-common guesses at rare words.

End-to-end simulation hello world!

I’ve talked to many people about how to maximise the utility of a simulator for business decision-making, rather than focussing on the fidelity of reproducing real phenomena. This generally means delivering a custom simulator project lean, in thin, vertical, end-to-end slices. This approach maximises putting learning into action and minimises risk carried forward.

For practitioners, and to sharpen my own thinking, I have provided a concrete example in code; a Hello, World! for custom simulator development. The simulated phenomena will be simple, but we’ll drive all the way through a very thin slice to supporting a business decision.

Introduction to simulation

To understand possible futures, we can simulate many things; physical processes, digital systems, management processes, crowd behaviours, combinations of these things, and more.

When people intend to make changes in systems under simulation, they wish to understand the performance of the system in terms of the design of actions they may execute or features that may exist in the real world. The simulator’s effectiveness is determined by how much it accelerates positive change in the real world.

Even as we come to understand systems through simulation, it is also frequently the case that people continue to interact with these systems in novel and unpredictable ways – sometimes influenced by the system design and typically impacting the measured performance – so it remains desirable to perform simulated and real world tests may in some combination, sequentially or in parallel.

For both of these reasons – because simulation is used to shorten the lead time to interventions, and because those interventions may evolve based on both simulated and exogenous factors – it is desirable to build simulation capability in an iterative manner, in thin, vertical slices, in order to respond rapidly to feedback.

This post describes a thin vertical slice for measuring the performance of a design in a simple case, and hence supporting a business decision that can be represented by the choice of design, and the outcomes of which can be represented by performance measures.

This post is a walk-through of this simulation_hello_world notebook and, as such, is aimed at a technical audience, but one that cares about aligning responsive technology delivery and simulation-augmented decision making for a wider stakeholder group.

If you’re just here to have fun observing how simulated systems behave, this post is not for you, because we choose a very simple, very boring system as our example. There are many frameworks for the algorithmic simulation of real phenomena in various domains, but our example here is so simple, we don’t use a framework.

The business decision

How much should we deposit fortnightly into a bank account to best maintain a target balance?

Custom simulator structure

To preserve agility over multiple thin slices through separation of concerns, we’ll consider the following structure for a custom simulator:

  1. Study of the real world
  2. Core algorithms
  3. Translation of performance and design
  4. Experimentation
  5. Translation of an environment

We’ll look at each stage and then put them all together and consider next steps. As a software engineering exercise, the ongoing iterative development of a simulator requires consideration of Continuous Delivery, and in particular shares techniques with Continuous Delivery for Machine Learning (CD4ML). I’ll expand on this in a future post.

Study of the real world

We study how a bank account works and aim to keep the core algorithms really simple – as befits a Hello, World! example. In our study, we recognise the essential features of: a balance, deposits and withdrawals. The balance is increased by deposits and reduced by withdrawals, as shown in the following Python code.

class Account:

  def __init__(self):
    self.balance = 0

  def deposit(self, amount):
    self.balance = self.balance + amount

  def withdraw(self, amount):
    self.balance = self.balance - amount

Core algorithms

Core algorithms reproduce real world phenomena, in some domain, be it physical processes, behaviour of agents, integrated systems, etc. We can simulate how our simple model of bank account will behave when we perform some set of transactions on it. In the realm of simulation, these are possible rather than actual transactions.

def simulate_transaction(account, kind, amount):
  if kind == 'd':
    account.deposit(amount)
  elif kind == 'w':
    account.withdraw(amount)
  else:
    raise ValueError("kind must be 'd' or 'w'")

def simulate_balance(transactions):
  account = Account()
  balances = [account.balance]
  for t in transactions:
    simulate_transaction(account, t[0], t[1])
    balances.append(account.balance)
  return balances

When we simulate the bank account, note that we are interested in capturing a fine-grained view of its behaviour and how it evolves (the sequence of balances), rather than just the final state. The final state can be extracted in the translation layer if required.

tx = [('d', 10), ('d', 20), ('w', 5)]
simulate_balance(tx)
[0, 10, 30, 25]

Visualisation is critical in simulator development – it helps to communicate function, understand results, validate and tune implementation and diagnose errors, at every stage of development and operation of a simulator.

This could be considered a type of discrete event simulation, but as above, we’re more concerned with a thin slice through the whole structure than the nature of the core algorithms.

Translation of performance and design

The core algorithms are intended to provide a fine-grained, objective reproduction of real observable phenomena. The translation layer allows people to interact with a simulated system at a human scale. This allows people to specify high-level performance measures used to evaluate the system designs, which are also specified at a high level, while machines simulate all the rich detail of a virtual world using the core algorithms.

Performance

We decide for our Hello, World! example that this is a transactional account, and as such we care about keeping the balance close to a target balance, so that sufficient funds will be available, but we don’t leave too much money in a low interest account. We therefore measure performance as the average (mean) absolute difference between the balance at each transaction, and the target balance. Every transaction that leaves us over or under the target amount is penalised by the difference and this penalty is averaged over transactions. A smaller measure means better performance.

This may be imperfect, but there are often trade-offs in how we measure performance. Discussing these trade-offs with stakeholders can be a fruitful source of further insight.

def translate_performance_TargetBalance(balances, target):
  return sum([abs(b - target) for b in balances]) / len(balances)

Design

While we can make any arbitrary number of transactions and decide about each individual transaction, we consider for simplicity that set up a fortnightly deposit schedule, and make one decision about the amount of that fortnightly deposit.

As the performance translation distills complexity into a single figure, so the design translation causes the inverse, with complexity blooming from a single parameter We translate the single design parameter into a list of every individual transaction, suitable for simulation by core algorithms.

def translate_design_FortnightlyDeposit(design_parameter):
  return [('d', design_parameter)] * ANNUAL_FORTNIGHTS

translate_design_FortnightlyDeposit(10)[:5]
[('d', 10), ('d', 10), ('d', 10), ('d', 10), ('d', 10)]

Performance = f(Design)

Now we can connect human-relevant design to human-relevant performance measure via the sequence:

design -> design translation -> core algorithm simulation -> performance translation -> performance measure

If we set our target balance at 100, we see how performance becomes a function of design by chaining translations and simulation.

def performance_of_design(design_translator, design_parameters):
  return translate_performance_Target100(
      simulate_balance(
          design_translator(design_parameters)))

This example above (and from the notebook) specialises the target balance performance translator to Target100 and makes the design translator configurable.

Remebering that we’re interested in the mean absolute delta between the actual balance and the target, here’s what it might look like to evaluate and visualise a design:

evaluating account balance target 100
with FortnightlyDeposit [9]
the mean abs delta is 61.89

Experimentation

Experimentation involves exploring and optimising simulated results through the simple interface of performance = f(design). This means choosing a monthly deposit amount and seeing how close we get to our target balance over the period. Note that while performance and design are both single values (scalars) in our Hello, World! example, in general they would both consist of multiple, even hundreds of, parameters.

This performance function shows visually the optimum (minimum) design is the point at the bottom of the curve. We can find the optimal design automatically using (in this instance) scipy.optimize.minimize on the function performance = f(design).

We can explore designs in design space, as above, and we can also visualise how an optimal design impacts the simulated behaviour of the system, as below.

Note that in the optimal design, the fortnightly deposit amount is ~5 for a mean abs delta of ~42 (the optimal performance) rather than a deposit of 9 (an arbitrary design) for a mean abs delta of ~62 (the associated performance) in the example above.

Translation of an environment

The example so far assumes we have complete control of the inputs to the system. This is rarely the case in the real world. This section introduces environmental factors that we incorporate into the simulation, but consider out of our control.

Just like design parameters, environmental factors identified by humans need to be translated for simulation by core algorithms. In this case, we have a random fortnightly expense that is represented as a withdrawal from the account.

def translate_environment_FortnightlyRandomWithdrawal(seed=42, high=5):
  rng = np.random.RandomState(seed)
  random_withdrawals = rng.randint(0, high=high, size=ANNUAL_FORTNIGHTS)
  return list(zip(['w'] * ANNUAL_FORTNIGHTS, random_withdrawals))

translate_environment_FortnightlyRandomWithdrawal()[:5]
[('w', 3), ('w', 4), ('w', 2), ('w', 4), ('w', 4)]

If we were to simulate this environmental factor with without any deposits being made, it would look like below.

Now we translate design decisions and environmental factors together into the inputs for the simulation core algorithms. In this case we interleave or “zip” the events together

def translate_FortnightlyDepositAndRandomWithdrawal(design_parameters):
  interleaved = zip(translate_design_FortnightlyDeposit(design_parameters),
                    translate_environment_FortnightlyRandomWithdrawal())
  return [val for pair in interleaved for val in pair]

translate_FortnightlyDepositAndRandomWithdrawal(design_1)[:6]
[('d', 9), ('w', 3), ('d', 9), ('w', 4), ('d', 9), ('w', 2)]

Putting it all together

We’ll now introduce an alternative design and optimise that design by considering performance when the simulation includes environmental factors.

Alternative design

After visualising the result of our first design against our performance measurement, an alternative design suggests itself. Instead of having only a fixed fortnightly deposit, we could make an initial large deposit, followed by smaller fortnightly deposits. Our design now has two parameters.

def translate_design_InitialAndFortnightlyDeposit(design_pars):
  return [('d', design_pars[0])] + [('d', design_pars[1])] * ANNUAL_FORTNIGHTS

design_2 = [90, 1]

In the absence of environmental factors, our system would evolve as below.

However, we can also incorporate the environmental factors by interleaving the environmental transactions as we did above.

Experimentation

Our performance function now incorporates two design dimensions. We can run experiments on any combination of the two design parameters to see how they perform. With consideration of environmental factors now, we can visualise performance and optimisation the design in two dimensions.

Instead of the low point on a curve, the optimal design is now the low point in a bowl shape, the contours of which are shown on the plot above.

This represents our optimal business decision based on what we’ve captured about the system so far, an initial deposit of about $105 and an fortnightly deposit of $2.40. If we simulate the evolution of the system under the optimal design, we see a result like below, and can see visually why it matches our intent to minimise the deviation from the target balance at each time..

The next thin slice

We could increase fidelity by modelling transactions to the second, but it may not substantially change our business decision. We could go on adding design parameters, environmental factors, and alternative performance measures. Some would only require changes at the translation layer, some would require well-understood changes to the core algorithms, and others would require us to return to our study of the world to understand how to proceed.

We only add these things when required to support the next business decision. We deliver another thin slice through whatever layers are required to support this decision by capturing relevant design parameters and performance measures at the required level of fidelity. We make it robust and preserve future flexibility with continuous delivery. And in this way we can most responsively support successive business decisions with a thinly sliced custom simulator.

Weighting objectives and constraints

We used a single, simple measure of performance in this example, but in general there will be many measures of performance, which need to be weighted against each other, and often come into conflict. We may use more sophisticated solvers as the complexity of design parameters and performance measures increases.

Instead of looking for a single right answer, we can also raise these questions with stakeholders to understand where the simulation is most useful in guiding decision-making, and where we need to rely on other or hyrbid methods, or seek to improve the simulation with another development iteration.

Experimentation in noisy environments

The environment we created is noisy; it has non-deterministic values in general, though we use a fixed random seed above. To properly evaluate a design in a noisy environment, we would need to run multiple trials.

Final thoughts

Simulation scenarios become arbitrarily complex. We first validate the behaviour in the simplest point scenarios that ideally have analytic solutions, as above. We can then calibrate against more complex real world scenarios, but when we are sufficiently confident the core algorithms are correct, we accept that the simulation produces the most definitive result for the most complex or novel scenarios.

Most of the work is then in translating human judgements about design choices and relevant performance measures into a form suitable for exploration and optimisation. This is where rich conversations with stakeholders are key to draw out their understanding and expectations of the scenarios you are simulating, and how the simulation is helping drive their decision-making. Focussing on this engagement ensures you’ll continue to build the right thing, and not just build it right.

Summertime, and the puzzling is made easier

In between beach trips and bike rides, I whiled away more than a few summer hours on puzzles I found in the various AirBNBs we rented. Returning home, I rediscovered a sliding tile puzzle with a twist called Asteroid Escape, where embedded asteroids prevent certain tiles sliding past each other in certain configurations.

A photo of the Asteroid Escape sliding tile puzzle on a desk

With 60 challenges from Starter to Wizard level, I was musing about generating the catalogue of challenges automatically. I’ve enjoyed automating puzzle solutions since brute-forcing Instant Insanity with BASIC in my teens (the code is long lost). While Starter puzzles may only require 6 moves, Wizard may require as many as 109 moves. I decided I would particularly enjoy avoiding the gentle frustration of manually solving so many variants of Asteroid Escape if I were to use an automated solver.

I built a solver in Python with numpy for working with grids of numbers. If automating puzzles is your thing too, check out the source code and read on for a summary of the approach. To execute the proposed solution, I printed out the moves and crossed them off with a pencil as I made them. There’s a lot that could be more elegant or efficient, but it gets the job done!

Solving the hardest Wizard level Asteroid Escape puzzle in the minimum 109 steps

Solver evolution

Define a simple sliding tile board. Elements of a 2D array hold the indexes of the tile pieces at each location (numbered 1..8 for a 3×3 board). The tiles can be reordered through a sequence of moves of the “hole”, that is the space with no tile (numbered 0). The available moves depend on the hole location – four options from the centre but only two from the corners.

Illustration of sliding tile puzzle with tile pieces in a 3x3 grid labelled with numbers 1 to 8, and the hole or gap labelled with 0

Solve the board. Find sequences of moves of the hole that reconfigure the board into a target arrangement of tiles. This I treated as a queue-based breadth-first graph search, with arrangements of pieces being the nodes of a graph, and moves being the edges. However the specific constraints of Asteroid Escape aren’t captured at this point; all tiles are interchangeable in their movements.

A sequence of sliding tile puzzle grids showing how the hole is moved around to reach a target arrangement

Model the blockers that prevent certain tile arrangements. I quickly realised I would miss edge cases if I tried to list permitted arrangements of tiles, and instead defined the “blockers” on each piece with reference to a 4×4 grid. The grid had one cell margin around the outside of the piece to model projecting outer blockers, and 4 positions inside the piece that could carry blockers. A grid cell would be assigned 1 if there was a blocker or 0 otherwise. For a proposed board arrangement, the piece grids could be offset to their location and superposed onto a larger 8×8 board grid by adding piece cell values. A resultant value greater than 1 would mean that two or more blockers were interfering, so the board arrangement was not actually possible.

Grids modelling blocking protrusions of individual pieces, illustrating how interference occurs when blockers from separate tiles (values of 1) occupy the same space (value of 2)

Solve the board while respecting blocked arrangements. The breath-first search identifies the neighbours – reachable in one move – from each arrangement of pieces. Any neighbours that would result in interfering blockers are now excluded from the search. The target state also only requires that the spacecraft piece should be in the exit position. Both of these constraints were captured as functions passed to the solver. This worked well for a Starter puzzle I tested, but failed on a Wizard puzzle, because it failed to capture piece interference that only occurs when moving pieces.

Model the blockers that prevent certain tile moves. In particular, the nose of the spacecraft doesn’t interfere with the protruding corner blocker on one asteroid tile in static arrangements, but there are scenarios where the tiles can’t be moved past on another. To model this, I increased the piece grid to 5×5 (now 9 interior positions) to capture the previously ignored spacecraft nose, and when assessing a move, I shifted the tile by one grid cell at a time so that each move assessed interference at start and end, and two intermediate positions.

A sequence of 4 grids illustrating interference between blockers on separate pieces as one piece slides past the other, though there is no interference at start or end position

Solve the board while respecting blocked arrangements and blocked moves. Update the neighbour search to exclude moves to neighbours that would result in blocker interference during the move. Update the target state to avoid any interference in sliding the spacecraft off the board.

Of course, the solver didn’t go off without a hitch when I tried it in the real world. I failed to specify a clear exit was required at first, and failed to specify it properly the second time, but third time was a charm (as above).

Asteroid Escape solver bloopers

Conclusion

From a cursory inspection, the Wizard level 60 solution in the notebook and animated above looks the same as the solution provided with the puzzle (so I didn’t really need a solver at all!) However, this solver can also find multiple other (longer) solutions to each challenge. It could also be used to generate and solve more variants at each level of difficulty, extending the catalogue of puzzles. Improvements can wait, however, as it’s time to enjoy the bike or the beach once more.

Throwback Thursday

The metaverse is a topic currently, though the concept has a long history. Twenty years ago, in the dotcom era, I was exploring this space, as I was recently reminded. Feeling nostalgic, I dug these projects out of the NAS archives. Tech has moved on, but there’s enduring relevance in what I learned.

VO2max (1999)

Virtual Online Orienteering, to the max! Conceived as similar to Catching Features, but a game that was online by default, as you could play in the browser in a virtual world authored in VRML97.

A screenshot of a virtual orienteering game showing a first person view of terrain, a start banner and a compass, with a list of checkpoints underneath.

The UI consisted of two browser windows: a first person view and motion controls (using the now defunct Cosmo Player), and a map in a second window. 

A screenshot of a virtual orienteering game showing a first person view of terrain, a boulder with a checkpoint next to it, and a compass, with a list of checkpoints underneath.

The draggable compass needle, the checkpoints and the course logic (must visit checkpoints in order), and a widget that visualised your completed route as an electric blue string hovering 3 feet above the ground were all modular VRML Protos.

A screenshot of a virtual orienteering game showing a first person view of terrain, the time to complete the course, and a blue line that shows the route taken to complete the course.

The map and terrain for the only level ever created were generated together with a custom C++ application. I was pretty pleased this all worked, and it demonstrated some concepts for…

4DUniverse (2000)

4DUniverse was a broad concept for virtual online worlds for socialising, shopping, gaming, etc, similar at the time to ActiveWorlds (which still exists today!), but again accessible through the browser (assuming you had a VRML plugin).

I thought I’d have great screengrabs to illustrate this part of the story, but I was surprised how few I’d captured, that they were very low resolution, and that they were in archaic formats. The source artefacts from this post – WRL, HTML, JS, JAVA, etc – have lost no fidelity, but would only meet modern standards and interpreters to various degrees. Maybe I will modernise them someday and generate new images to do justice to the splendour I held in my mind!

A variety of screengrabs of virtual worlds from the 4DUniverse. Three different worlds with different themes are shown. Vaguely Greek open hotel lobby, Gothic cathedral in orbit, etc

We authored a number of worlds, connected by teleports, with the tools we had to hand, being text editors, spreadsheets, and custom scripts. While a lot of fun, we came to the conclusion that doing the things we envisaged in the 4DUniverse wasn’t any more compelling than doing them in the 2D interfaces of the time. VRML eventually went away, and probably because no one else was able to make a compelling case for its use. At least I crafted a neat animated GIF logo rendered with POV-Ray.

4 capital D's with a metallic finish joined in a flower shape that is rotating

Less multiverse, and more quantum realm, I also generated VRML content at nanoscale with…

NanoCAD (2001)

NanoCAD was a neat little (pun intended) CAD application for designing molecules, which I extended with a richer editing UI, supporting the design of much more complex hypothesised molecular mechanisms.

NanoCAD UI showing 3D view of construction of a large buckytube from graphene sheets. Application controls and help text are shown in the lower panel.

The Java app allowed users to place atoms in 3D and connect them with covalent bonds. Then an energy solver would attempt to find a stable configuration for the molecules (using classical rather than quantum methods). With expressive selection, duplication and transformation mechanics, it was possible to create benzene rings, stitch them into graphene sheets, and roll them up into “enormous” buckytubes, or other complex carbon creations.

NanoCAD UI showing detailed 3D view of construction of a large buckytube from graphene sheets. Application controls and help text are shown in the lower panel. Further desktop background including MS Word, Windows Explorer and WinAmp evoke the 2001 era

I also created cables housed inside sheaths, gears – built with benzene ring teeth attached to buckytubes – and other micro devices. If 4DUniverse was inspired by Snow Crash, NanoCAD was inspired by The Diamond Age. Nanocad could run in a browser as an applet and the molecules could also be exported as WRL files for display in other viewers.

3D visualisation in the space-filling style of a molecular gear made from benzene ring teeth attached to the outside of a buckytube

Comparing contemporary professional projects

It’s nice to contrast the impermanence of these personal projects with the durability of my contemporary professional work with ANCA Machines. At the time, I was documenting the maths and code of Cimulator3D and also developing the maths and bidirectional UI design for iFlute, both used in the design and manufacturing of machine tools via grinding processes. Both products are still on the market in similar form today, more than two decades later.

I wonder how I’ll view this post in another twenty years?

Second Semantle Solver

In the post Sketching Semantle Solvers, I introduced two methods for solving Semantle word puzzles, but I only wrote up one. The second solver here is based the idea that the target word should appear in the intersection between the cohorts of possible targets generated by each guess.

Finding the semantle target through overlapping cohorts. Shows two intersecting rings of candidate words based on cosine similarity.

To recap, the first post:

  • introduced the sibling strategies side-by-side,
  • discussed designing for sympathetic sequences, so the solver can play along with humans, with somewhat explainable guesses, and
  • shared the source code and visualisations for the gradient descent solver.

Solution source

This post shares the source for the intersecting cohorts solver, including notebook, similarity model and solver class.

The solver is tested against the simple simulator for semantle scores from last time. Note that the word2vec model data for the simulator (and agent) is available at this word2vec download location.

Stylised visualisation of the search for a target word with intersecting  cohorts. Shows distributions of belief strength at each guess and strength and rank of target word

The solver has the following major features:

  1. A vocabulary, containing all the words that can be guessed,
  2. A semantic model, from which the agent can calculate the similarity of word pairs,
  3. The ability to generate cohorts of words from the vocabulary that are similar (in Semantle score) to a provided word (a guess), and
  4. An evolving strength of belief that each word in the vocabulary is the target.

In each step towards guessing the target, the solver does the following:

  1. Choose a word for the guess. The current choice is the word with the strongest likelihood of being the target, but it could equally be any other word from the solver’s vocabulary (which might help triangulate better), or it could be provided by a human player with their own suspicions.
  2. Score the guess. The Semantle simulator scores the guess.
  3. Generate a cohort. The guess and the score are used to generate a new cohort of words that would share the same score with the guess.
  4. Merge the cohort into the agent’s belief model. The score is added to the current belief strength for each word in the cohort, providing a proxy for likelihood for each word. The guess is also masked from further consideration.

Show of strength

The chart below shows how the belief strength (estimated likelihood) of the target word gradually approaches the maximum belief strength of any word, as the target (which remains unknown until the end) appears in more and more cohorts.

Intersecting cohorts solver. Line chart showing the belief strength of the target word at each guess in relation to the maximum belief strength of remaining words.

We can also visualise the belief strength across the whole vocabulary at each guess, and the path the target word takes in relation to these distributions, in terms of its absolute score and its rank relative to other words.

Chart showing the cohort solver belief strength across the whole vocabulary at each guess, and the path the target word takes in relation to these distributions, in terms of its absolute score and its rank relative to other words

Superior solution?

The cohort solver can be (de)tuned to almost any level of performance by adjusting the parameters precision and recall, which determine the tightness of the similarity band and completeness of results from the generated cohorts. The gradient descent solver has potential for tuning parameters, but I didn’t explore this much. To compare the two, we’d therefore need to consider configurations of each solver. For now, I’m pleased that the two distinct sketches solve to my satisfaction!

Sketching Semantle Solvers

Semantle is an online puzzle game in which you make a series of guesses to discover a secret word. Each guess is scored by how “near” it is to the secret target, providing guidance for subsequent guesses, but that’s all the help you get. Fewer guesses is a better result, but hard to achieve, as the majority of words are not “near” and there are many different ways to get nearer to the target.

You could spend many enjoyable hours trying to solve a puzzle like this, or you could devote that time to puzzling over how a machine might solve it for you…

Scoring system

Awareness of how the nearness score is calculated can inspire potential solutions. The score is based on a machine learning model of language; how frequently words appear in similar contexts. These models convert each word into a unique point in space (also known as an embedding) in such a way that similar words are literally near to one another in this space, and therefore the similarity score is higher for points that are nearer one another.

Diagram of a basic semantic embedding example. The words "dog" and "cat" are shown close together, while the word "antidisestablishmentariansim" is shown distant from both.

We can reproduce this similarity score ourselves with a list of English words and a trained machine learning model, even though these models use 100s of dimensions rather than two, as above. Semantle uses the word2vec model but there are also alternatives like USE. Comparing these results to the scores from a Semantle session could guide a machine’s guesses. We might consider this roughly equivalent to our own mental model of the nearness of any pair of words, which we could estimate if we were asked.

Sibling strategies

Two general solution strategies occurred to me to find the target word guided by similarity scores: intersecting cohorts and gradient descent.

Intersecting cohorts: the score for each guess defines a group of candidate words that could be the target (because they have the same similarity with the guessed word as the score calculated from the target). By making different guesses, we get different target cohorts with some common words. These cohort intersections allow us to narrow in on the words most like to be the target, and eventually guess it correctly.

Diagram showing two similarity cohorts. These form halos around the axis of guess direction, based on dot product similarity, and intersect in the direction of the target word.

Gradient descent: each guess gives a score, and we look at the difference between scores and where the guesses are located relative to each other to try to identify the “semantic direction” in which the score is improving most quickly. We make our next guess in that direction. This doesn’t always get us closer but eventually leads us to the target.

Diagram showing a number of nodes and gradient directions between nodes. One is highlighted showing the maximum gradient and direction of the next guess, which is a node close to the extension of the direction vector.

I think human players tend more towards gradient descent, especially when close to the target, but also use some form of intersecting cohorts to hypothesise potential directions when uncertain. For a machine, gradient descent requires locations in embedding space to be known, while intersecting cohorts only cares about similarity of pairs.

Sympathetic sequences

Semantle is open source and one could create a superhuman solver that takes unfair advantage of knowledge about the scoring system. For instance, 4 significant figures of similarity (as per semantle scores) allows for pretty tight cohorts. Additionally, perfectly recalling large cohorts of 10k similar words at each guess seems unrealistic for people.

I was aiming for something that produced results in roughly the same range as a human and that could also play alongside a human should they want a helpful suggestion. Based on limited experience, the human range seems to be – from exceptional to exasperated – about 20 to 200+ guesses.

This lead to some design intents:

  • that the solving agent capabilities were clearly separated from the Semantle scoring system (I would like to use a different semantic model for the agent in future)
  • that proposing the next guess and incorporating the results from a guess would be decoupled to allow the agent to play with others
  • that the agent capabilities could be [de]tuned if required to adjust performance relative to humans or make its behaviour more interpretable

Solution source

This post shares the source for the gradient descent solver and a simple simulator for semantle scores. Note that the word2vec model data for the simulator (and agent) is available at this word2vec download location.

I have also made a few iterations on the intersecting cohorts approach, which works but isn’t ready for publication.

Seeking the secret summit

The gradient descent (or ascent to a summit) approach works pretty well by just going to the most similar word and moving a random distance in the direction of the steepest known gradient. The nearest not previously guessed word to the resultant point is proposed as the next guess. You can see a gradual but irregular improvement in similarity as it searches.

Line chart of similarity score to target for each word in a sequence of guesses. The line moves upwards gradually but irregularly for most of the chart and shoots up at the end. The 46 guesses progress from thaw to gather.

In the embedding space, I overlaid a network (or graph) of “nodes” representing words and their similarity to the target, and “spokes” representing the direction between nodes and the gradient of similarity in that direction. This network is initialised with a handful of random guesses before the gradient descent begins in earnest. Below I’ve visualised the search in this space with respect to the basis – the top node and spoke with best gradient – of each guess.

Chart showing progession of basis of guessing the target word. The horizontal axis is current best guess. The vertical axis is current reference word. A line progresses in fewer hops horizontally and more hops vertically from bottom left to top right.

The best results are about 40 guesses and typically under 200, though may blow out on occasion. I haven’t really tried to optimise the search; as above, the first simple idea worked pretty well. To [de]tune or test the robustness of this solution, I’ve considered adding more noise to the search for the nearest word from the extrapolated point, or compromising the recall of nearby words, or substituting a different semantic model. These things might come in future. At this stage I just wanted to share a sketch of the solver rather than a settled solution.

Postscript: after publishing, I played with the search visualisation in an attempt to tell a more intuitive story (from literally to nobody).

Line chart showing the similarity of each of a sequence of 44 guesses to a semantle target. The line is quite irregular but trends up from first guess “literally” at bottom left to target “nobody” at top right. The chart is annotated with best guess at each stage and reference words for future guesses.

Stop the sibilants, s’il vous plaît

C’est suffit! I’m semantically sated. After that sublime string of subheadings, the seed of a supplementary Wordle spin-off sprouts: Alliteratle anyone?

Project Slackpose

Another lockdown, another project for body and mind. Slackpose allows me to track my slackline walking and review my technique. Spending 5 minutes on the slackline between meetings is a great way to get away from my desk!

I had considered pose estimation for wheelies last year, but decided slackline walking was an easier start, and something the whole family could enjoy.

Setup

I mount my phone on a tripod at one end of the slackline and start recording a video. This gives a good view of side-to-side balance, and is most pixel-efficient in vertical orientation.

Alternatively, duct tape the phone to something handy, or even hand-hold or try other angles. The 2D pose estimation (location of body joints in the video image) will be somewhat invariant to the shooting location, but looking down the slackline (or shooting from another known point) may help reconstruct more 3D pose data using only a single camera view. Luckily, it’s also invariant to dogs wandering into frame!

I use OpenPose from the CMU Perceptual Computing Lab to capture pose data from the videos. See below for details of the keypoint pose data returned and some notes on setting up and running OpenPose.

I then analyse the keypoint pose data in a Jupyter notebook that you can run on Colab. This allows varied post-processing analyses such as the balance analysis below.

Keypoint Pose Data

OpenPose processes the video to return data on 25 body keypoints for each frame, representing the position of head, shoulders, knees, and toes, plus eyes, ears, and nose and other major joints (but mouth only if you explicitly request facial features).

These body keypoints are defined as (x, y) pixel locations in the video image, for one frame of video. We can trace the keypoints over multiple frames to understand the motion of parts of the body.

Keypoints also include a confidence measure [0, 1], which is pretty good for the majority of keypoints from the video above.

Balance Analysis

I wanted to look at balance first, using an estimate of my body’s centre of mass. I calculated this from the proportional mass of body segments (sourced here) with estimates of the location centre of mass for each segment relative to the pose keypoints (I just made these up, and you can see what I made up in the notebook).

This looks pretty good from a quick eyeball, although it’s apparent that it is sensitive to the quality of pose estimation of relatively massive body segments, such as my noggin, estimated at 8.26% of my body mass. When walking away, OpenPose returns very low confidence and a default position of (0, 0) for my nose for many frames, so I exclude it from the centre of mass calculation in those instances. You can see the effect of excluding my head in the video below.

Not much more to report at this point; I’ll have a better look at this soon, now that I’m up and walking with analysis.

OpenPose Notes

Setup

I ran OpenPose on my Mac, following the setup guide at https://github.com/CMU-Perceptual-Computing-Lab/openpose, and also referred to two tutorials. These instructions are collectively pretty good, but note:

  • 3rdparty/osx/install_deps.sh doesn’t exist, you will instead find it at scripts/osx/install_deps.sh
  • I had to manually pip install numpy and opencv for OpenPose with pip install 'numpy<1.17' and pip install 'opencv-python<4.3‘, but this is probably due to my neglected Python2 setup.
  • Homebrew cask installation syntax has changed from the docs; now invoked as brew install --cask cmake

Running

Shooting iPhone video, I need to convert format for input to OpenPose. Use ffmeg as below (replace slackline.mov/mp4 with the name of your video).

ffmpeg -i slackline.mov -vcodec h264 -acodec mp2 slackline.mp4

I then typically invoke OpenPose to process video and output marked up frames and JSON files with the supplied example executable, as below (again replace input video and output directory with your own):

./build/examples/openpose/openpose.bin --video slack_video/slackline.mp4 --write_images slack_video/output --write_json slack_video/output

LEGO and Software – Part Roles

This is the fifth post in a series exploring LEGO® as a Metaphor for Software Reuse. A key consideration for reuse is the various roles that components can play when combined or re-combined in sets. Below we’ll explore how we can use data about LEGO parts and sets to understand the roles parts play in sets.

I open a number of lines of investigation, but this is just the start, rather than any conclusion, of understanding the roles parts play and how that influences outcomes including reuse. The data comes from the Rebrickable data sets, image content & API and the code is available at https://github.com/safetydave/reuse-metaphor.

Hero Parts

Which parts play the most important roles in sets? Which parts could we least easily substitute from other sets?

We could answer this question in the same way as we determine relevant search results from documents, for instance with a technique called TFIDF (term frequency-inverse document frequency). We can find hero parts in sets with set frequency-inverse part frequency, which in the standard formulation requires a corpus of parts “documents” listing sets “terms” for each set that includes that part, as below.

part 10190: "10403-1 10403-1 10404-1 10404- ... "
part  3039: "003-1 003-1 003-1 003-1 021-1  ... "
part  3023: "021-1 021-1 021-1 021-1 021-1  ... "

Inverse part frequency is closely related to the inverse of the reuse metric from part 4, hence we can expect it will find the least reused parts. Considering again our sample set 60012-1, LEGO City Coast Guard 4×4, (including 4WD, trailer, dingy, and masked and flippered diver), we find the following “hero” parts.

Gallery of hero parts from LEGO Coast Guard set (60012-1) including stickers, 4WD tyres, a dinghy, flippers and mask

This makes intuitive sense. These “hero parts” are about delivering on the specific nature of the set. It’s much harder to substitute (or reuse other parts) for these hero parts – you would lose something essential to the set as it is designed. On the other hand, as you might imagine, the least differentiating parts (easiest to substitute or reuse alternatives) overlap significantly with the top parts from part 4. Note while mechanically – in a sense of connecting parts together – it may not be possible to replace these parts, these parts don’t do much to differentiate the set from other sets.

Gallery of least differentiated parts from LEGO Coast Guard set (60012-1) including common parts like plates, tiles, blocks and slopes.

Above, we consider sets as terms (words) in a document for each part. We can also reverse this by considering a set as a document, and included parts as terms in that document. Computing this part frequency-inverse set frequency measure across all parts and sets gives us a sparse matrix.

Visualisation of TFIDF or part-frequency-invers-set-frequency as a sparse 2D matrix for building a search engine for sets based on parts

This can be used as a search engine to find the sets most relevant to particular parts. For instance, if we query the parts "2431 2412b 3023" (you can see these in Recommended Parts below), the top hit is the Marina Bay Sands set, which again makes intuitive sense – all those tiles, plates, and grilles are the essence of the set.

Recommended Parts

Given a group of parts, how might we add to the group for various outcomes including reuse? For instance, if a new set design is missing one part that is commonly included with other parts in that design, could we consider redesigning the set to include that part to promote greater reuse?

A common recommendation technique for data in the shape of our set-part data is Association Rule Learning (aka “Basket Analysis”), which will recommend parts that are usually found together in sets (like items in baskets).

An association rule in this case is an implication of the form {parts} -> {parts}. Multiple of these rules form a directed graph, which we can visualise. I used the Efficient Apriori package to learn rules. In the first pass, this gives us some reasonable-looking recommendations for many of the top parts we saw in part 4.

Visualisation of discovered association rules as a directed graph showing common parts

You can read this as the presence of 2431 in a set implies (recommends) the presence of 3023, as does 2412b, which also implies 6141. We already know these top parts occur in many sets, so it’s likely they occur together, but we do see some finer resolution in this view. The association rules for less common parts might also be more insightful; this too may come in a future post.

Relationships Between Parts

How can we discover more relationships between parts that might support better outcomes including reuse?

We can generalise the part reuse analysis from part 4 and the techniques above by capturing the connections between sets and parts as a bipartite graph. The resultant graph contains about 63,000 nodes – representing both parts and sets – and about 633,000 edges – representing instances of parts included in sets. A small fragment of the entire graph, based on the flipper part 10190, the sets that include this part, and all other parts included in these sets, is shown below.

Visualisation of set neighbours (count 79) of flipper 10190 and their part neighbours (count 1313) as two parallel rows of nodes with many connections between them
Visualisation of selected set neighbours (count 3) of flipper 10190 and selected of their part neighbours (count 14) as two parallel rows of nodes with some connections between them

This bipartite representation allows us to find parts related by their inclusion in LEGO sets using a projection, which is a derived graph that only includes parts nodes, linked by edges if they share a set. In this projection, our flipper is directly linked to the 1312 other parts with which it shares any set.

Visualisation of 1312 immediate neighbours of flipper 10190 in the part projection of the set-part graph, shows only 1% of connections but this is very dense nonetheless

You can see this is a very densely connected set of parts, and more so on the right side, from 12 o’clock around to 6 o’clock. We could create a similar picture for each part, but we can also see the overall picture by plotting degree (number of connections to parts with shared sets) for all part, with a few familiar examples.

Degree of nodes in part projection, with plate 1x2, slope 45 2x2 and flipper highlighted. Steep drop-off from maximum and long flat tail

This is the overall picture of immediate neighbours, and it shows the familiar traits of a small number of highly connected parts, and a very long tail of sparsely connected parts. We can also look beyond immediate neighbours to the path(s) through the projection graph between parts that don’t directly share a set, but are connected by common parts that themselves share a set. Below is one of the longest paths, from the flipper 10190 to multiple Duplo parts.

Visualisation of a path through the part connection graph spanning 7 nodes, with some neighbouring nodes also shown and parts drawn on

With a projection graph like this, we could infer that parts that are designed to be used together are closer together. We could use this information to compile groups of parts to specific ends. Given some group of parts, we could (1) add “nearby” missing parts to that group to create flexible foundational groups that could be used for many builds, or we could (2) add “distant” parts that could allow us to specialise builds in particular directions that we might not have considered previously. In these cases, “nearby” and “distant” are measured in terms of the path length between parts. There are many other ways we could use this data to understand part roles.

(When I first plotted this, I thought I had made a mistake, but it turns out there are indeed sets including both regular and Duplo parts, in this case this starter kit.)

The analysis above establishes some foundational concepts, but doesn’t give us a lot of new insight into the roles played by parts. The next step I’d like to explore here is clustering and/or embedding the nodes of the part graph, to identify groups of similar parts, which may come in a future post.

Lessons

As I said above, there are no firm conclusions in this post regarding reuse in LEGO or how this might influence our view of and practices around reuse in software. However, if we have data about our software landscape in a similar form to the set-part data we’ve explored here, we might be able to conduct similar analyses to understand the roles that reusable software components play in different products, and, as a result, how to get better outcomes overall from software development.

Coming Next

I think the next post, and it might just be the last in this series, is going to be about extending these views of the relationships between parts and understanding how they might drive or be driven by the increase in variety and specialisation discussed in part 2.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

LEGO and Software – Part Reuse

This is the fourth post in a series exploring LEGO® as a Metaphor for Software Reuse. The story is evolving as I go because I keep finding interesting things in the data. I’ll tie it all up with the key things I’ve found at some point.

In this post we’re looking from the part perspective – the reusable component – at how many sets or products it’s [re]used in, and how this has changed over time. All the analysis from this post is available at https://github.com/safetydave/reuse-metaphor.

Inventory Data

The parts inventory data from the Rebrickable data sets, image content & API gives us which parts appear in which sets, in which quantities and colours. For instance, if we look at the minifigure flipper from part 3, we can chart its inclusion in sets as below.

The parts inventory data includes minifigures. I may later account for the effect of including or excluding minifgures in these various analyses. We can also tell if a part in a set is a spare; according to the data, 2% of LEGO parts in sets are spares.

Parts Included in Sets

The most reused parts are included in thousands of sets. Here is a gallery of the top 25 parts over all time – from number 1 Plate 1 x 2 (which is included in over 6,200 sets), to our favourite from last time, the venerable Slope 45° 2 x 2 (over 2,700 sets).

These parts appear in a significant fraction of sets, but we can already see a 50% reduction in the number of sets in just the top 25 used parts. Beyond this small sample, we can plot the number of sets that include a given part (call it part set count), as below.

This time we have used log scales on both axes, due to the extremely uneven distribution of part set counts. In contrast to the thousands of sets including top parts, a massive 60% of parts are included in only one set and only 10% of parts are included in more than ten sets. The piecewise straight line fit indicates a power law applies, for instance approximately count = 10000 / sqrt(rank) for the top 100 ranked parts.

Uneven distributions are often expressed in terms of the Pareto principle or 80/20 rule. If we define reuse instances as every time a part is included in a set, after the first set, then we can plot the contribution of each part to total reuse instances and see if this is more or less uneven that the 80/20 rule.

This shows us that reuse of LEGO parts in sets is much more uneven than the 80/20 rule. While the 80/20 rule says 80% of reuse would be due to 20% of parts, in fact we find by this definition that 80% of reuse is due to only 3% of parts, and 20% of parts account for 98% of reuse!

We find a similar phenomenon if we consider the quantities of parts included in sets, rather than just the count of sets per part. We could repeat the whole analysis based on quantities (and the notebook has some options for doing this), but I was fairly satisfied the results would be similar given the degree of correlation we find, below.

I was intrigued, though, by the part that appeared many times in a single set, then never again (the uppermost point of the lower left column). It turns out it is a Window 1 x 2 x 1 (old type) with Extended Lip included in a windows set from 1966 that probably looked a bit like this one.

This brings us neatly to the “long tail” to round out this view of reuse as parts included in multiple sets. As per the distribution above, the tail (of parts that belong to only one set) is really long and full of curiosities. The tail is over 22,000 parts long, though these belong to only about 10,000 unique sets. The parts belong about 60/40 to sets proper vs minifigure packs. Here’s a tail selection – you’ll see they are fairly specialised items like stickers, highly custom parts, minifigure items with unique designs, and even trading cards!

There’s even, perhaps my nemesis, a minifigure set called “Chainsaw Dave”!

Parts Included in Sets Over Time

Part reuse might vary with time, and might be dependent on time. In previous posts we’ve seen an exponential increase in new parts and sets over time and an exponential decay in part lifetimes. We can plot part reuse (set count) against lifespan, as below.

This shows some correlation – which we might expect as longievity is due to reuse and vice versa – but I was also intrigued by the many relatively short-lived parts with high set counts (in the mid-top left region). Colouring points by the year released shows that these are relatively recent parts (at the yellow end of the spectrum). This shows that, as well as long-lived parts, more recent parts also appear in many sets, which is good news for reuse.

However, it’s hard to determine the distribution of set count from the scatter plot, and hence how significant the reuse is. We can see the distribution better with a violin plot, which shows the overall range (‘T’ ends), the distribution of common values (shading), and the median (cross-bar), much like the box plot.

We see that although many parts released in the last few decades are reused in 100s or even 1000s of sets, the median or typical part appears in only a handful of sets. With 100s to 1000s of sets released each year recently, the sets are reusing both old and new parts, but the vast majority of parts are not significantly reused.

Top Parts for Reuse Over Time

Above, we introduced the top 25 parts by set count, based on all time set count, but how has this varied over time? We can chart – for each of the top 25 parts – how many sets they appeared in each year. However, as the number of sets released each year has increased exponentially, we see a clearer pattern if we chart the proportion of sets released each year including the top 25 parts, as below.

This shows variation in proportional set representation for the top parts from about 10% to 50% over the last 40 years. However, there was a very noticeable drop around the year 2000 to a maximum of only 20% of sets including top reused parts. Interestingly, this corresponds to a documented period of financial difficulty for The LEGO Group, but further research would be required to demonstrate any relationship. Prior to 1980, the maximum proportional representation was even higher, indicating a major intention of sets was to provide reusable parts.

From 2000 onwards, the all time ranking becomes more apparent, as the lines generally spread to match the rankings. We can also see this resolution of ranks in a bump chart for the top 4 parts over the time period from just before 2000 to the present.

Lessons for Software

As in previous posts, here’s where I speculate on what this data-directed investigation of LEGO parts and sets over the years might mean for software development. This is only on the presumption that – as often cited informally in my experience – LEGO products are a good metaphor for software. I take this as a given for this series, and as an excuse to play with some LEGO data, but I don’t really test it.

Given the reuse of LEGO parts across sets and time, we might expect that for software:

  • Most reuse will likely come from a small number of components, and this may be far more extreme than the 80/20 heuristic
  • If this is the case, then teams should build for use before reuse, or design very simple foundational components (see the top 25) for widespread reuse
  • Building for use before reuse means optimising for fast development and retirement of customised components that won’t be reusable
  • Reuse may vary significantly over time depending on your product strategy
  • It’s possible to introduce new reusable components at any time, but their impact might not be very noticeable with a product strategy that drives many customised components

Next Time

I plan to look further into the relationship between parts and sets, and how this has evolved over time.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

LEGO and Software – Lifespans

This is the third post in a series exploring LEGO as a Metaphor for Software Reuse through data (part 1 & part 2).

In this post, we’ll look at reuse through the lens of LEGO® part lifespans. Not how long before the bricks wear out, are chewed by your dog, or squashed painfully underfoot in the dark, but for what period each part is included in sets for sale.

This is a minor diversion from looking further into the reduction in sharing and reuse themes from part 2, but lifespan is a further distinct concept related to reuse, worthy I think of its own post. All the analysis from this post, which uses the the Rebrickable API, is available at https://github.com/safetydave/reuse-metaphor.

Ages in a Sample Set

To understand sets and parts in the Rebrickable API, I first raided our collection of LEGO build instructions for a suitable set to examine, and I came up with 60012-1, LEGO City Coast Guard 4×4.

Picture of LEGO build instructions for Coast Guard set
The sample set

In the process I discovered parts data contained year_from and year_to attributes, which would enable me to chart the ages of each part in the set when it was released, as a means of understanding reuse.

Histogram of ages of parts in set 60012-1, with 12 parts of <1 year age, gradually decreasing to 7 parts 50-55 years of age

In line with the exponential increase of new parts we’ve already seen, the most common age bracket is 0-5 years, but a number of parts in this set from 2013 were 50-55 years old when it was released! Let’s see some examples of new and old parts…

Image of a new flipper (0-1 yrs age) and sloping brick (55 years age)

The new flipper is pretty cool, but is it even worthy to be moulded from the same ABS plastic as Slope 45° 2 x 2? The sloping brick surely deserves a place in the LEGO Reuse Hall of Fame. It has depth and range – from computer screen in moon base, to nose cone of open wheel racer, to staid roof tile, to minifigure recliner in remote lair. Contemplating this part took me back to my childhood, all those marvellous myriad uses.

And yet I also recalled the slightly unsatisfactory stepped profile that resulted from trying to build a smooth inclined surface with a stack of these parts. As such, this part captures the essence and the paradox of reuse in LEGO products; a single part can do a lot, but it can’t do everything. Let’s look at lifespan of parts more broadly.

Lifespans Across All Parts

The distribution of lifespan across LEGO parts is very uneven.

Distribution of lifespans of LEGO parts - histogram

The vast majority of parts are in use less than 1 year, and only a small fraction are used for more than 10 years. Note I calculate lifespan = year_to + 1 - year_from, and this is using strict parts data, rather than also including minifigures.

Distributions of lifespans of LEGE parts, pie chart with ranges

Exponential Decay

This distribution looks like exponential decay. To understand more clearly, it’s back to the logarithmic domain, where we can fit approximations in two regimes; for the first five years (>80% of parts) and then for the remaining life.

LEGO part lifespans, counts plotted on vertical log scale

The half-life of parts in the first 5 years is about 1 year over that period. So each of the first five years, only about half the parts live to the next year. However, the first year remains an outlier, as only 34% of parts make it to their second year and beyond. After 5 years, just under 1,000 survive compared to the 25,000 at 1 year, and from that point the half-life extends to about 7 years. So, while some parts like Slope 45° 2 x 2 live long lives, the vast majority are winnowed out much earlier.

Churn

We can also look at the count of parts released (year_from) and the count of parts retired (year_to + 1) each year.

LEGO parts released and retired each year - line chart

As expected, parts released shows exponential growth, but parts retired also grows, and almost in synchrony, so the net change is small compared to the total parts in and out. By summing up the difference each year, we can chart the number of active parts over time.

Active LEGO parts by year and change by year - line and column plot

Active parts are a small proportion of all parts released to date; they represent about one seventh of all parts released, approximately 5,500 of 36,500. Comparing total changes to the active set size each year also shows a high and increasing rate of churn.

LEGO part churn by year - line chart

So even as venerable stalwarts such as Slope 45° 2 x 2 persist, in recent years about 80% of active LEGO parts have churned each year! Interestingly, the early 1980s to late 1990s was a period of much lower churn. Note also churn percentages prior to 1970 were high and varied widely (not shown before 1965 for clarity), probably reflective of a much smaller base of parts and maybe artefacts with older data.

Lifespans vs Year Release and Retired

We’ve got a lot of insight from just year_from and year_to. One last way to look at lifespans is how they have changed over time, such as lifespan vs year released or retired.

LEGO part lifespan scatter plots

Obvious in these charts is that we only have partial data on the lifespan of active parts (we don’t know how much longer until they’ll be retired), but as above, they are a small proportion. We can discern a little more by using a box plot.

LEGO part lifespan box plots

The plot shows, for each year, median lifespans (orange), the middle range (box), the typical range (whiskers) and lifespan outliers (smoky grey). We see here again that the 1980s and 1990s were a good period in relative terms for releasing long-lived parts that have only just been retired. However, with the huge volume of more short-lived parts being retired in recent years, we don’t see their impact in the late 2010s on the retired plot, except as outliers. In general, the retired (left) plot, like the later years of the released (right) plot shows lower lifespan distributions because the long-lived parts are overwhelmed by ever-increasing numbers of contemporaneous short-lived parts.

Lessons for Software Reuse

If LEGO products are to be a metaphor and baseline for reuse in software products, this analysis of part lifespans is consistent with the observations from part 1, while further highlighting:

  • Components that are heavily reused may be a minority of all components, and in an environment of frequent and increasing product releases, many components may have very short lifetimes, driven by acute needs.
  • There may be a “great filter” for reuse, such as the one year or five year lifespan for LEGO parts. This may also be interpreted as “use before reuse”, or that components must demonstrate exceptional performance in core functions, or support stable market demands, before wider reuse is viable.
  • Our impressions and expectations for reuse of software components may be anchored to particular time periods. We see that the 1980s and 1990s (when there were only ~10% of the LEGO parts released to 2020) were a period of much lower churn and the release of relatively more parts with longer lifespans. The same may be true for periods of software development in an organisation’s history.
  • Retirement of old components can be synchronised with introduction of new components, and in fact, this is probably essential to effectively manage complexity and to derive benefits of reuse without the burden of legacy.

Further Analysis

We’ll come back to the reduction in sharing and reuse theme, and find a lot more interesting ways to look at the Rebrickable data in future posts.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.