Humour me – DRY vs WRY

Don’t Repeat Yourself (DRY) is a tenet of software engineering, but – humour me – let’s consider some reasons Why to Repeat Yourself (WRY).

LEGO reuse lessons

In 2021, I wrote a series of posts analysing LEGO® data about parts appearing in sets to understand what it might tell us about reuse of software components in digital products. I’ve finally summarised the key findings that show both DRY and WRY forces at play. We’re strictly talking about reuse VS specialisation (not repetition), but I think the lessons on the reuse dynamic are relevant.

Exponential growth in volume

The total number of parts and sets ever created has grown exponentially over the years. The result is that in 2021, there were 10 times as many parts as 30 years ago, and about 5 times as many sets. Thus, even though parts can be re-combined to create new models, new parts are constantly introduced at an increasing rate.

Bar chart of new lego parts each year, with a line showing total parts. The vertical scale is logarithmic, and both new and total parts follow a straight line on the chart, indicating exponential growth

Read more in LEGO as a Metaphor for Software Reuse – Does the Data Stack Up?

Exponential decay in lifespan

While the oldest parts are 70 years old, only about 1/7 of all parts ever created are in active use in 2021 and fully 1/3 of parts have a lifespan of only one year. Over time, survival decays exponentially. In each of the first 5 years, 50% of parts don’t survive to the next year. Beyond that, remaining parts halve in number every seven years.

Chart of lego part lifespans. The vertical axis is logarithmic. The scattered points can be approximated with two linear segments, one for the first five years, showing a half-life of 0;9 years, an another for the remaining ~70 years, showing a half-life of 7.2 years

Read more in LEGO and Software – Lifespans.

Power-law distribution of reuse

Some parts are heavily reused in sets offered for sale, but the vast majority of parts are never reused or only reused a little, which can be approximated with a power law. Reuse is far more uneven than a typical 80/20 distribution: 80% of reuse instances are due to only 3% of parts, and 20% of parts account for 98% of reuse instances. At the other end of the spectrum, 60% of parts are used in only one set, and only 10% of parts appear in more than 10 sets.

Log-log scatter plot of count of part inclusion in sets. Two linear segments fitted to plot show power law approximation for reuse

Read more in LEGO and Software – Part Reuse.

Churn driven by growth and specialisation

Given the growth and specialisation profiles, total churn of parts approached 100% in 2020, whereas in the decade centred on 1990, it was only about 20%. High churn is consistent with a small base of heavily reused parts, and ever-increasing numbers of specialised parts with short lifespans.

Read more in LEGO and Software – Variety and Specialisation and LEGO and Software – Lifespans.

Part roles emerge from the reuse graph

We can understand more about the roles played by specialised and reused parts though analysis of the graph of connections between parts and sets, and identify new opportunities for recombination.

Network visualisation showing association rules between common parts

Read more in LEGO and Software – Part Roles.

Lessons for software

What would I take away for software?

Reusability of components doesn’t necessarily lead to reuse. The majority of reuse will come from a few components that perform fundamental roles. Focus on getting this right.

More – and more specialised – products may drive specialisation of components. Digital product lines are never static and we may expect some components to have short lifespans and churn heavily. Good development practices and loosely-coupled architectures allow teams to work with ephemeral and idiosyncratic components. However, ongoing review can still identify opportunities to harvest patterns and consolidate specialised components.

Note that, even when we produce multiple similar code artefacts, we may see effective reuse of higher-level approaches and concepts.

These aren’t prescriptive rules, but a reflection of the patterns in the data. There are more comprehensive observations in the individual articles. We should remember that reuse is not the primary aim of producing software, but a principle that supports better organisation towards sustainably responsive delivery.

Discussion of data relevance

Why is LEGO data relevant? In many conversations I’ve had about software reuse, LEGO is presented as a desirable model. This may be peculiar to me, but I think it is a fairly common conversation.

The number of possible mechanical couplings of just a handful of bricks is indeed enormous, but I wanted to understand how these components had actually been assembled into products that were sold to customers over some period of time. The data is sourced from the Rebrickable API. I’ve just taken part data at face value in this analysis; if something is recorded as a distinct part, I treat it as a distinct part. There may be better ways to translate the LEGO metaphor to software components.

Maybe there’s a generational factor in LEGO as a metaphor too; in the 1980s and 1990s, you would play with a much smaller and more stable base of active parts than the 2000s and 2010s, and that could shape your thinking. I’d love to hear feedback.

LEGO® is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site.

Smarter Semantle Solvers

A little smarter, anyway. I didn’t expect to pick this up again, but when I occasionally run the first generation solvers online, I’m often equal parts amused and frustrated by rare words thrown up that delay the solution – from amethystine to zigging.

Animation of online semantle solution
An example solution with fewer than typical rare words guessed

The solvers used the first idea that worked; can we make some tweaks to make them smarter? The code is now migrated to its own new repo after outgrowing its old home.

Measuring smarts

I measure solver performance by running multiple trials of a solver configuration against the simulator for a variety of target words. This gives a picture of how often the solver typically succeeds within a certain number of guesses.

Chart showing cumulative distribution function curves for two solver configurations

Vocabulary

It turns out that the vocabulary to date based on english_words_set is a poor match for the most frequently used English words, according to unigram frequency data.

So we might expect that simply replacing the solver vocabulary would improve performance, and we also get word ranking from unigram_freq.

Semantic models

We’ll continue with Universal Sentence Encoder (USE) to ensure search strategies are robust to different semantic models.

Search

To improve the gradient solver I tried making another random guess every so often to avoid long stretches exploring local minima. But it didn’t make things better, and probably made them worse!

In response, I made each guess the most common local word to the extrapolated semantic location, rather than just the nearest word. Still no better, and trying both “improvements” together was significantly worse!

Ah well, experiments only fail if we fail to learn from them!

Vocabulary again

I think the noise inherent in a different semantic model, plus the existing random extrapolation distance, overwhelms the changes I tried. In better news, we see a major improvement from using unigram freq vocabulary, reducing the mean from 280 (with many searches capped at 500) to 198, approximately a 30% improvement.

Smarter still?

Here we see that the data-centric (vocabulary) improvement had a far bigger impact than any model-centric (search algorithm) improvement that I had the patience to try (though I left a bunch of further todos). Maybe just guessing randomly from the top n words will be better again! ????

At least I’ve made a substantial dent in reducing those all-too-common guesses at rare words.

End-to-end simulation hello world!

I’ve talked to many people about how to maximise the utility of a simulator for business decision-making, rather than focussing on the fidelity of reproducing real phenomena. This generally means delivering a custom simulator project lean, in thin, vertical, end-to-end slices. This approach maximises putting learning into action and minimises risk carried forward.

For practitioners, and to sharpen my own thinking, I have provided a concrete example in code; a Hello, World! for custom simulator development. The simulated phenomena will be simple, but we’ll drive all the way through a very thin slice to supporting a business decision.

Introduction to simulation

To understand possible futures, we can simulate many things; physical processes, digital systems, management processes, crowd behaviours, combinations of these things, and more.

When people intend to make changes in systems under simulation, they wish to understand the performance of the system in terms of the design of actions they may execute or features that may exist in the real world. The simulator’s effectiveness is determined by how much it accelerates positive change in the real world.

Even as we come to understand systems through simulation, it is also frequently the case that people continue to interact with these systems in novel and unpredictable ways – sometimes influenced by the system design and typically impacting the measured performance – so it remains desirable to perform simulated and real world tests may in some combination, sequentially or in parallel.

For both of these reasons – because simulation is used to shorten the lead time to interventions, and because those interventions may evolve based on both simulated and exogenous factors – it is desirable to build simulation capability in an iterative manner, in thin, vertical slices, in order to respond rapidly to feedback.

This post describes a thin vertical slice for measuring the performance of a design in a simple case, and hence supporting a business decision that can be represented by the choice of design, and the outcomes of which can be represented by performance measures.

This post is a walk-through of this simulation_hello_world notebook and, as such, is aimed at a technical audience, but one that cares about aligning responsive technology delivery and simulation-augmented decision making for a wider stakeholder group.

If you’re just here to have fun observing how simulated systems behave, this post is not for you, because we choose a very simple, very boring system as our example. There are many frameworks for the algorithmic simulation of real phenomena in various domains, but our example here is so simple, we don’t use a framework.

The business decision

How much should we deposit fortnightly into a bank account to best maintain a target balance?

Custom simulator structure

To preserve agility over multiple thin slices through separation of concerns, we’ll consider the following structure for a custom simulator:

  1. Study of the real world
  2. Core algorithms
  3. Translation of performance and design
  4. Experimentation
  5. Translation of an environment

We’ll look at each stage and then put them all together and consider next steps. As a software engineering exercise, the ongoing iterative development of a simulator requires consideration of Continuous Delivery, and in particular shares techniques with Continuous Delivery for Machine Learning (CD4ML). I’ll expand on this in a future post.

Study of the real world

We study how a bank account works and aim to keep the core algorithms really simple – as befits a Hello, World! example. In our study, we recognise the essential features of: a balance, deposits and withdrawals. The balance is increased by deposits and reduced by withdrawals, as shown in the following Python code.

class Account:

  def __init__(self):
    self.balance = 0

  def deposit(self, amount):
    self.balance = self.balance + amount

  def withdraw(self, amount):
    self.balance = self.balance - amount

Core algorithms

Core algorithms reproduce real world phenomena, in some domain, be it physical processes, behaviour of agents, integrated systems, etc. We can simulate how our simple model of bank account will behave when we perform some set of transactions on it. In the realm of simulation, these are possible rather than actual transactions.

def simulate_transaction(account, kind, amount):
  if kind == 'd':
    account.deposit(amount)
  elif kind == 'w':
    account.withdraw(amount)
  else:
    raise ValueError("kind must be 'd' or 'w'")

def simulate_balance(transactions):
  account = Account()
  balances = [account.balance]
  for t in transactions:
    simulate_transaction(account, t[0], t[1])
    balances.append(account.balance)
  return balances

When we simulate the bank account, note that we are interested in capturing a fine-grained view of its behaviour and how it evolves (the sequence of balances), rather than just the final state. The final state can be extracted in the translation layer if required.

tx = [('d', 10), ('d', 20), ('w', 5)]
simulate_balance(tx)
[0, 10, 30, 25]

Visualisation is critical in simulator development – it helps to communicate function, understand results, validate and tune implementation and diagnose errors, at every stage of development and operation of a simulator.

This could be considered a type of discrete event simulation, but as above, we’re more concerned with a thin slice through the whole structure than the nature of the core algorithms.

Translation of performance and design

The core algorithms are intended to provide a fine-grained, objective reproduction of real observable phenomena. The translation layer allows people to interact with a simulated system at a human scale. This allows people to specify high-level performance measures used to evaluate the system designs, which are also specified at a high level, while machines simulate all the rich detail of a virtual world using the core algorithms.

Performance

We decide for our Hello, World! example that this is a transactional account, and as such we care about keeping the balance close to a target balance, so that sufficient funds will be available, but we don’t leave too much money in a low interest account. We therefore measure performance as the average (mean) absolute difference between the balance at each transaction, and the target balance. Every transaction that leaves us over or under the target amount is penalised by the difference and this penalty is averaged over transactions. A smaller measure means better performance.

This may be imperfect, but there are often trade-offs in how we measure performance. Discussing these trade-offs with stakeholders can be a fruitful source of further insight.

def translate_performance_TargetBalance(balances, target):
  return sum([abs(b - target) for b in balances]) / len(balances)

Design

While we can make any arbitrary number of transactions and decide about each individual transaction, we consider for simplicity that set up a fortnightly deposit schedule, and make one decision about the amount of that fortnightly deposit.

As the performance translation distills complexity into a single figure, so the design translation causes the inverse, with complexity blooming from a single parameter We translate the single design parameter into a list of every individual transaction, suitable for simulation by core algorithms.

def translate_design_FortnightlyDeposit(design_parameter):
  return [('d', design_parameter)] * ANNUAL_FORTNIGHTS

translate_design_FortnightlyDeposit(10)[:5]
[('d', 10), ('d', 10), ('d', 10), ('d', 10), ('d', 10)]

Performance = f(Design)

Now we can connect human-relevant design to human-relevant performance measure via the sequence:

design -> design translation -> core algorithm simulation -> performance translation -> performance measure

If we set our target balance at 100, we see how performance becomes a function of design by chaining translations and simulation.

def performance_of_design(design_translator, design_parameters):
  return translate_performance_Target100(
      simulate_balance(
          design_translator(design_parameters)))

This example above (and from the notebook) specialises the target balance performance translator to Target100 and makes the design translator configurable.

Remebering that we’re interested in the mean absolute delta between the actual balance and the target, here’s what it might look like to evaluate and visualise a design:

evaluating account balance target 100
with FortnightlyDeposit [9]
the mean abs delta is 61.89

Experimentation

Experimentation involves exploring and optimising simulated results through the simple interface of performance = f(design). This means choosing a monthly deposit amount and seeing how close we get to our target balance over the period. Note that while performance and design are both single values (scalars) in our Hello, World! example, in general they would both consist of multiple, even hundreds of, parameters.

This performance function shows visually the optimum (minimum) design is the point at the bottom of the curve. We can find the optimal design automatically using (in this instance) scipy.optimize.minimize on the function performance = f(design).

We can explore designs in design space, as above, and we can also visualise how an optimal design impacts the simulated behaviour of the system, as below.

Note that in the optimal design, the fortnightly deposit amount is ~5 for a mean abs delta of ~42 (the optimal performance) rather than a deposit of 9 (an arbitrary design) for a mean abs delta of ~62 (the associated performance) in the example above.

Translation of an environment

The example so far assumes we have complete control of the inputs to the system. This is rarely the case in the real world. This section introduces environmental factors that we incorporate into the simulation, but consider out of our control.

Just like design parameters, environmental factors identified by humans need to be translated for simulation by core algorithms. In this case, we have a random fortnightly expense that is represented as a withdrawal from the account.

def translate_environment_FortnightlyRandomWithdrawal(seed=42, high=5):
  rng = np.random.RandomState(seed)
  random_withdrawals = rng.randint(0, high=high, size=ANNUAL_FORTNIGHTS)
  return list(zip(['w'] * ANNUAL_FORTNIGHTS, random_withdrawals))

translate_environment_FortnightlyRandomWithdrawal()[:5]
[('w', 3), ('w', 4), ('w', 2), ('w', 4), ('w', 4)]

If we were to simulate this environmental factor with without any deposits being made, it would look like below.

Now we translate design decisions and environmental factors together into the inputs for the simulation core algorithms. In this case we interleave or “zip” the events together

def translate_FortnightlyDepositAndRandomWithdrawal(design_parameters):
  interleaved = zip(translate_design_FortnightlyDeposit(design_parameters),
                    translate_environment_FortnightlyRandomWithdrawal())
  return [val for pair in interleaved for val in pair]

translate_FortnightlyDepositAndRandomWithdrawal(design_1)[:6]
[('d', 9), ('w', 3), ('d', 9), ('w', 4), ('d', 9), ('w', 2)]

Putting it all together

We’ll now introduce an alternative design and optimise that design by considering performance when the simulation includes environmental factors.

Alternative design

After visualising the result of our first design against our performance measurement, an alternative design suggests itself. Instead of having only a fixed fortnightly deposit, we could make an initial large deposit, followed by smaller fortnightly deposits. Our design now has two parameters.

def translate_design_InitialAndFortnightlyDeposit(design_pars):
  return [('d', design_pars[0])] + [('d', design_pars[1])] * ANNUAL_FORTNIGHTS

design_2 = [90, 1]

In the absence of environmental factors, our system would evolve as below.

However, we can also incorporate the environmental factors by interleaving the environmental transactions as we did above.

Experimentation

Our performance function now incorporates two design dimensions. We can run experiments on any combination of the two design parameters to see how they perform. With consideration of environmental factors now, we can visualise performance and optimisation the design in two dimensions.

Instead of the low point on a curve, the optimal design is now the low point in a bowl shape, the contours of which are shown on the plot above.

This represents our optimal business decision based on what we’ve captured about the system so far, an initial deposit of about $105 and an fortnightly deposit of $2.40. If we simulate the evolution of the system under the optimal design, we see a result like below, and can see visually why it matches our intent to minimise the deviation from the target balance at each time..

The next thin slice

We could increase fidelity by modelling transactions to the second, but it may not substantially change our business decision. We could go on adding design parameters, environmental factors, and alternative performance measures. Some would only require changes at the translation layer, some would require well-understood changes to the core algorithms, and others would require us to return to our study of the world to understand how to proceed.

We only add these things when required to support the next business decision. We deliver another thin slice through whatever layers are required to support this decision by capturing relevant design parameters and performance measures at the required level of fidelity. We make it robust and preserve future flexibility with continuous delivery. And in this way we can most responsively support successive business decisions with a thinly sliced custom simulator.

Weighting objectives and constraints

We used a single, simple measure of performance in this example, but in general there will be many measures of performance, which need to be weighted against each other, and often come into conflict. We may use more sophisticated solvers as the complexity of design parameters and performance measures increases.

Instead of looking for a single right answer, we can also raise these questions with stakeholders to understand where the simulation is most useful in guiding decision-making, and where we need to rely on other or hyrbid methods, or seek to improve the simulation with another development iteration.

Experimentation in noisy environments

The environment we created is noisy; it has non-deterministic values in general, though we use a fixed random seed above. To properly evaluate a design in a noisy environment, we would need to run multiple trials.

Final thoughts

Simulation scenarios become arbitrarily complex. We first validate the behaviour in the simplest point scenarios that ideally have analytic solutions, as above. We can then calibrate against more complex real world scenarios, but when we are sufficiently confident the core algorithms are correct, we accept that the simulation produces the most definitive result for the most complex or novel scenarios.

Most of the work is then in translating human judgements about design choices and relevant performance measures into a form suitable for exploration and optimisation. This is where rich conversations with stakeholders are key to draw out their understanding and expectations of the scenarios you are simulating, and how the simulation is helping drive their decision-making. Focussing on this engagement ensures you’ll continue to build the right thing, and not just build it right.

22 rules of generative AI

Thinking about adopting, incorporating or building generative AI products? Here are some things to think about, depending on your role or roles.

I assume you’re bringing your own application based on an understanding of an opportunity or problem that involves creating, combining or transforming some kind of digital content. If you don’t have that understanding of a customer problem and why other solutions are not suitable, go and get it! Digital content may mean text, code, images, sound, video, 3D, etc, for digital consumption, or it may mean digitized designs for real world products or services such as code (again), recipes, blueprints, etc. Some of this may also be relevant for how you use other people’s generative AI tools in your own work.

Product strategy and management roles

1. Know what input you have to an AI product or feature that’s difficult to replicate. This is generally proprietary data, but it may be an algorithm tuned in-house, access to compute resources, or a particularly responsive deployment process, etc. This separates competitive differentiators from competitive parity features.

2. Interrogate the role of data. Do you need historical data to start, can you generate what you need through experimentation, or can you leverage your proprietary data with open source data, modelling techniques or SaaS products? Work with you technical leads to understand the multitude of mathematical and ML techniques available to ensure data adds the most value for the least effort.

3. Understand where to use open source or Commercial Off-The-Shelf (COTS) software for parity features, but also understand the risks of COTS including roadmaps, implementation, operations and data.

4. Recognise that functional performance of AI features is uncertain at the outset and variable in operation, which creates delivery risk. Address this by: creating a safe experimentation environment, supporting dual discovery (creating knowledge) and development (creating software) tracks with a continuous delivery approach, and – perhaps the hardest part – actually responding to change.

Design roles

5. Design for failure, and loss of vigilance in the face of rare failures. Failure can mean outputs that are nonsensical, fabricated, incorrect, or – depending on scope and training data – harmful.

6. Learn the affordances of AI technologies so you understand how to incorporate them into user experiences, and can effectively communicate their function to your users.

7. Study various emerging UX patterns. My quick take: generative AI may be used as a discrete tool with (considering #5) predictable results for the user, such as replacing the background in a photo, it may be used as a collaborator, reliant on a dialogue or back-and-forth iterative design or search process between the user and AI, such as ChatGPT, or it may be used as an author, producing a nearly finished work that the user then edits to their satisfaction (which comes with risk of subtle undetected errors).

8. Consider what role the AI is playing in the collaborator pattern – is it designer, builder, tester, or will the user decide? There is value in generating novel options to explore as a designer, in expediting complex workflows as a builder, and in verifying or validating solutions to some level of fidelity as a tester. However, for testing, remember you can not inspect quality into a product, and consider building in quality from the start.

9. Design for explainability, to help users understand how their actions influence the output. (This overlaps heavily with #6)

10. More and more stakeholders will want to know what goes into their AI products. If you haven’t already, start on your labelling scheme for AI features, which may include: intended use, data ingredients and production process, warnings, reporting process, and so on, with reference to risk and governance below.

Data science and data engineering roles

11. Work in short cycles in multidisciplinary product teams to address end-to-end delivery risks.

12. Quantify the functional performance of systems, the satisfaction of guardrails, and confidence in these measures for to support product decisions.

13. Make it technically easy and safe to work with and combine rich data.

14. Implement and automate a data governance model that enables delivery of data products and AI features to the support business strategy (i.e., a governance model that captures the concerns of other rules and stakeholders here).

Architecture and software engineering roles

15. Understand that each AI solution is narrow, but composable with other digital services. In this respect, treat each AI solution as a distinct service until a compelling case is made for consolidation. (Note that, as above, product management should be aware of how to make use of existing solutions.)

16. Consolidate AI platform services at the right level of abstraction. The implementation of AI services may be somewhat consistent, or it may be completely idiosyncratic depending on the solution requirements and available techniques. The right level of abstraction may be emergent and big up-front design may be risky.

17. Use continuous delivery for short feedback cycles and delivery that is both iterative – to reduce risk from knowledge gaps – and responsive – to reduce the risk of a changing world.

18. Continuous delivery necessitates a robust testing and monitoring strategy. Make use of test pyramids for both code and data for economical and timely quality assurance.

Risk and governance roles

19. Privacy and data security are the foundation on which everything else is built.

20. Generative AI solutions, like other AI solutions, may also perpetuate harmful content, biases or correlations in their historical training data.

21. Understand that current generative AI solutions may be subject to some or all of the following legal and ethical issues, depending on their source data, training and deployment as a service: privacy, copyright or other violation regarding collection of training data, outputs that plagiarise or create “digital forgeries” of training data, whether the aggregation and intermediation of individual creators at scale is monopoly behaviour and whether original creators should be compensated, that training data may include harmful content (which may be replicated into harmful outputs), that people may have been exposed to harmful content in a moderation process, and that storing data and the compute for training and inference may have substantial environmental costs.

22. Develop strategies to address the further structural failure modes of AI solutions, such as: misalignment with user goals, deployment into ethically unsound applications, the issue of illusory progress where small gains may look promising but never cross the required threshold, the magnification of rare failures at scale and the resolution of any liability for those failures.

Conclusion

These are the type of role-based considerations I alluded to in Reasoning About Machine Creativity. The list is far from complete, and the reader would doubtless benefit from sources and references! I intended to write this post in one shot, which I did in 90 minutes while hitting the target 22 rules without significant editing, so I will return after some reflection. Let me know if these considerations are helpful in your roles.

Reasoning About Machine Creativity

With the current interest in generative AI, I wanted to write a short post updating the framing I took in my older talk Reasoning About Machine Intuition (2017), which was intended for broad audiences to understand the impact and best application of AI solutions from multiple digital delivery perspectives.

Bicycles and automobiles share some features and are used for many of the same tasks, but have important differences that must be considered by transport planners. Recently, electric bikes have created another distinct mobility category that nonetheless shares some elements with existing categories. So it is with AI solutions. While AI may share some features of human intelligence and be suitable for some of the same tasks, understanding the differences is crucial for digital professionals to be able to reason about their capabilities and applicability.

Skip right ahead to machine creativity if you want! Also listen to my podcast on Creative AI for another perspective.

Machine intuition

Considering that products and features introduced in the ML boom of the late 2010s allowed sufficiently good decisions to be made on complex data without precisely specified rules (e.g., image classification), I chose to characterise these solutions as “machine intuition”, in order to highlight that their narrow artificial intelligences were most comparable to human intuition. However, important differences remain. And of course I used “reasoning” in the title to highlight the capability of human intelligence that wasn’t present in these solutions.

Diagram illustrating a large number of emojis feeding into a decision node

Similarities to human intuition

Opportunities, tasks or problems amenable to both approaches share these characteristics:

  • Good decisions will be made, based on ambiguous inputs, but mistakes will also be made,
  • The approach is useful if solutions make enough good decisions in aggregate for a given context, and the volume and nature of mistakes is tolerable,
  • The decisions may have limited explainability, even if explainability important,
  • The decisions are based on past experience and therefore subject to bias.

(NB there are many examples of particularly egregious, discriminatory and harmful mistakes that were not detected or considered prior to release of AI solutions, and that the understanding of what constitutes a mistake, in addition to whether the decision itself is structurally discriminatory, must consider many ethical dimensions.)

Differences from human intuition

If a machine intuition approach looks suitable based on the characteristics above, we must also consider the differences below:

  • The artificial intelligence remains narrow – it can only perform one specific task and only to the degree permitted by its training data. This is different to a human, who can easily generalise to a related task or accommodate new data. However, the same or similar data may be sliced multiple ways to support multiple related narrow tasks, and individual solutions are composable – maybe embarrassingly so – and composable with other digital services, all of which may substitute as a limited form of generality.
  • Machine intuition requires vastly more training instances (many more even than any human expert might see in a lifetime) and concomitantly more computing power than human intuition. NB. These training instances also must be presented in a specific format and are also typically labelled by humans! In contrast, human intuition may only need a handful of examples, and can fall back on reasoning or inference from related experience if direct intuition fails (generalisation again). However, machines may be trained on a volume of data that no human could consume, and any trained model can be reproduced and deployed almost infinitely, so at some scale, low variable cost may compensate for high fixed cost.
  • Machine intuition is possible at superhuman scales, in particular volume of data or requests, and speed of inference. For instance, translating all of Wikipedia in fractions of a second. Machine intuition may also exceed functional human performance at the relevant task, though effective measurement of this must carefully consider the task definition and potential for bias.
  • Machine intuition will fail in some proportion of predictions as a matter of course (though we assume this is manageable) and is also subject to weird/trivial (adversarial) failure modes, such as changing a single pixel, that humans are generally robust to. Mistakes at scale from a single centralised ML solution may also be less acceptable than the aggregate mistakes made by many independent humans.

Anyone involved in delivery of AI solutions should keep these basic factors in mind in order to reason about product and engineering concerns. There is more to consider, but this is a good starting point.

Machine creativity

Considering the current generative AI boom, I think of these solutions as “machine creativity” in order to highlight that their narrow artificial intelligences are most comparable to human creativity in a given medium. However, important differences remain.

Diagram illustrating a single creative spark generating a large number of emoji

Creativity for our purposes is taking some simple input and creating a complex output from the input, an output that also incorporates other ideas, knowledge and techniques beyond the input. That output may be almost any form of digital content, from natural language text, to code, to images, to music, to movies, to 3D scenes, to animated 3D movies. AI that is embodied or with access to manufacturing may also exhibit creativity in the real world, through the materialisation of digital designs.

Some applications of generative AI look more like search, databases, or even back-ends, but they are like our creative reference in that they produce complex outputs from simple inputs, and by similar mechanisms.

(NB legal and ethical issues remain to be resolved with respect to some current mechanisms available to machine creativity to incorporate external ideas, knowledge and techniques. These include: copyright and potential for plagiarism, safety of input and output content and safety of human moderators, attribution and compensation for original creators, and so on.)

Similarities to human creativity

Opportunities, tasks or problems amenable to both approaches share these characteristics:

  • There is not a single “right” answer, multiple answers will suffice and may even be desirable to generate valuable options to pursue,
  • Assessing the goodness of the outputs may include some degree of subjectivity,
  • There may be surprising or non-obvious elements in the output, and again this may be desirable, or risky, or both,
  • The process is likely iterative, with multiple rounds of review and editing.

Differences from human creativity include

If a machine creativity approach looks suitable based on an application being sympathetic to the characteristics above, we must also consider the differences below:

  • The artificial intelligence has no agency or intent in its creativity, it simply processes inputs to generate outputs that are likely or typical based on its training data, described as “next token prediction” (where a token is an element of text, or patch of an image, etc). This may also appear as misalignment or the generation of unsafe content, which can be difficult to detect or control currently.
  • The artificial intelligence has no logically consistent model of the world. The outputs it generates have a high probability of following the prompt, but are not necessarily logically consistent with the prompt or even internally consistent, which can lead to articulate but nonsensical, incorrect or harmful answers. (i.e., It’s also missing the reasoning which is absent from intuition.)
  • The artificial intelligence remains narrow. It performs one generative task but it does not subject the output to a reasoned review or critique, as might be performed by a human to detect error. However, it is again composable, and tests could be applied after the generative step, though these too are fallible. There are many examples of creative AI tool-chains being shared by human creators to support complex creative workflows.
  • Machine creativity also requires more training instances, but is similarly almost infinitely reproducible for creating outputs. Leveraging current tools which include third party training data, it is important to understand the provenance of those training instances – whether they were used with permission, whether they were curated in an ethical manner, and so on.
  • There is by default no explicit attribution of influences on an output, although this is an area of focus and may be improved directly in creative systems or by hybrid means.
  • Machine creativity is also possible at superhuman scales of speed and volume
  • Machine creativity is also subject to weird/trivial adversarial attacks, such as prompt injection

Conclusion

As I’ve been guided by the set of machine intuition considerations above for a number of years, this is the initial set of considerations that I will take forward when considering applications for machine creativity, though I will continue to review their relevance in light of future developments.

In future, I’d like to address out these considerations more specifically by the various roles in a digital delivery organisation, as per the original talk.

Summertime, and the puzzling is made easier

In between beach trips and bike rides, I whiled away more than a few summer hours on puzzles I found in the various AirBNBs we rented. Returning home, I rediscovered a sliding tile puzzle with a twist called Asteroid Escape, where embedded asteroids prevent certain tiles sliding past each other in certain configurations.

A photo of the Asteroid Escape sliding tile puzzle on a desk

With 60 challenges from Starter to Wizard level, I was musing about generating the catalogue of challenges automatically. I’ve enjoyed automating puzzle solutions since brute-forcing Instant Insanity with BASIC in my teens (the code is long lost). While Starter puzzles may only require 6 moves, Wizard may require as many as 109 moves. I decided I would particularly enjoy avoiding the gentle frustration of manually solving so many variants of Asteroid Escape if I were to use an automated solver.

I built a solver in Python with numpy for working with grids of numbers. If automating puzzles is your thing too, check out the source code and read on for a summary of the approach. To execute the proposed solution, I printed out the moves and crossed them off with a pencil as I made them. There’s a lot that could be more elegant or efficient, but it gets the job done!

Solving the hardest Wizard level Asteroid Escape puzzle in the minimum 109 steps

Solver evolution

Define a simple sliding tile board. Elements of a 2D array hold the indexes of the tile pieces at each location (numbered 1..8 for a 3×3 board). The tiles can be reordered through a sequence of moves of the “hole”, that is the space with no tile (numbered 0). The available moves depend on the hole location – four options from the centre but only two from the corners.

Illustration of sliding tile puzzle with tile pieces in a 3x3 grid labelled with numbers 1 to 8, and the hole or gap labelled with 0

Solve the board. Find sequences of moves of the hole that reconfigure the board into a target arrangement of tiles. This I treated as a queue-based breadth-first graph search, with arrangements of pieces being the nodes of a graph, and moves being the edges. However the specific constraints of Asteroid Escape aren’t captured at this point; all tiles are interchangeable in their movements.

A sequence of sliding tile puzzle grids showing how the hole is moved around to reach a target arrangement

Model the blockers that prevent certain tile arrangements. I quickly realised I would miss edge cases if I tried to list permitted arrangements of tiles, and instead defined the “blockers” on each piece with reference to a 4×4 grid. The grid had one cell margin around the outside of the piece to model projecting outer blockers, and 4 positions inside the piece that could carry blockers. A grid cell would be assigned 1 if there was a blocker or 0 otherwise. For a proposed board arrangement, the piece grids could be offset to their location and superposed onto a larger 8×8 board grid by adding piece cell values. A resultant value greater than 1 would mean that two or more blockers were interfering, so the board arrangement was not actually possible.

Grids modelling blocking protrusions of individual pieces, illustrating how interference occurs when blockers from separate tiles (values of 1) occupy the same space (value of 2)

Solve the board while respecting blocked arrangements. The breath-first search identifies the neighbours – reachable in one move – from each arrangement of pieces. Any neighbours that would result in interfering blockers are now excluded from the search. The target state also only requires that the spacecraft piece should be in the exit position. Both of these constraints were captured as functions passed to the solver. This worked well for a Starter puzzle I tested, but failed on a Wizard puzzle, because it failed to capture piece interference that only occurs when moving pieces.

Model the blockers that prevent certain tile moves. In particular, the nose of the spacecraft doesn’t interfere with the protruding corner blocker on one asteroid tile in static arrangements, but there are scenarios where the tiles can’t be moved past on another. To model this, I increased the piece grid to 5×5 (now 9 interior positions) to capture the previously ignored spacecraft nose, and when assessing a move, I shifted the tile by one grid cell at a time so that each move assessed interference at start and end, and two intermediate positions.

A sequence of 4 grids illustrating interference between blockers on separate pieces as one piece slides past the other, though there is no interference at start or end position

Solve the board while respecting blocked arrangements and blocked moves. Update the neighbour search to exclude moves to neighbours that would result in blocker interference during the move. Update the target state to avoid any interference in sliding the spacecraft off the board.

Of course, the solver didn’t go off without a hitch when I tried it in the real world. I failed to specify a clear exit was required at first, and failed to specify it properly the second time, but third time was a charm (as above).

Asteroid Escape solver bloopers

Conclusion

From a cursory inspection, the Wizard level 60 solution in the notebook and animated above looks the same as the solution provided with the puzzle (so I didn’t really need a solver at all!) However, this solver can also find multiple other (longer) solutions to each challenge. It could also be used to generate and solve more variants at each level of difficulty, extending the catalogue of puzzles. Improvements can wait, however, as it’s time to enjoy the bike or the beach once more.

Nerfing along

NeRFs provide many benefits for 3D content: the rendering looks natural while the implementation is flexible. So I wanted to get hands on, and build myself a NeRF. I wanted to understand what’s possible to reproduce in 3D from just a spontaneous video capture. I chose a handheld holiday video from an old iPhone X while cycling on beautiful Maria Island.

Video taken while cycling on Maria Island

The camera moves along a fairly straight path, pointing a little right of the direction of travel. This contrasts with NeRFs or scans of objects, where the camera may do one or more full orbits of the object to get every perspective and thus produce seamless renders and clean models. I expect 3D generated from the video above will be missing some detail.

My aim was to build a NeRF from the video, render alternative camera paths, explore the generated geometry, and understand the application and limitation of the results. Here’s the view from one alternative camera path, which follows the original path at first, and then swings out to the side.

Alternative camera path rendered from NeRF

Worfklow overview

I used nerfstudio via their Colab notebook running on Colab Pro with GPU to render the final and intermediate products. The table below lists the major stages, tools and products.

StageToolProduct
Process video datans-process-data video via COLMAPImages of each frame (png) with inferred camera poses (json)
Train NeRFns-train nerfactoNeRF configuration data including final model checkpoint (ckpt)
Define camera pathsnerfstudio viewerCamera path definition based on keyframes (json)
Render videosns-renderNovel video of the NeRF scene (mp4)
Export geometryns-export pointcloudPoint cloud with surface colour and estimated normals (ply)
Consume geometryMeshlabVisualised pointcloud
nerfstudio workflow overview

For reference, I consumed about 3 Colab Pro “compute units” with one end-to-end train and render (6s 480p 60fps video), but including running the install steps (for transient runtimes) and doing multiple renders on different paths has consumed about 6 “compute units” per NeRF.

Workflow details

Here’s a more detailed walkthrough. There are lots of opportunities to improve.

Process video data

This stage produces a set of images from the video, corresponding to each requested frame, and uses COLMAP to infer the pose of each image. The video was 480p and 6s at 60fps. This processed data is suitable for training a NeRF. The result is visualised below in the nerfstudio viewer.

Posed video frames

I used the `sequential option for video but haven’t evaluated any speedup. I’m not having much luck with specifying the number of frames via the command line parameter either. The resultant files could be zipped and stored outside the Colab instance (locally or on Drive) for direct input to the training stage.

Train NeRF

The magic happens here. The nerfstudio viewer provides live exploration of the radiance field as it is progressively refined through training. The landscape was recognisable very early on in the training process and it was hard to discern improvements in the later stages (at least when using the viewer interactively).

The trained model can also be zipped and stored outside the Colab instance for direct input into later stages.

Define camera paths

I defined one camera path to initially follow the camera’s original trajectory and then deviate significantly to show alternative perspectives and test the limits of scene reconstruction. This path is shown below.

Deviating camera path

I also defined a second path that reversed the original camera trajectory. I downloaded these camera paths for reuse.

Render videos

Rendering the deviating path (video above), the originally visible details are recreated quite convincingly. Noise is visible when originally hidden details are exposed, and also generally around the edges of the frame. I would like to try videos from cameras with a wider field of view to see how much more of the scene they capture.

The second, reversed, path (below) also faithfully reconstructs visible objects, but with some loss of fidelity due to the reversed camera position, and displays more of noise outside the known scene.

Reversed camera path rendered from NeRF

Export geometry

I ran ns-export pointcloud and chose to add estimated normals to the export. I downloaded the ply file to work with it locally.

Consume geometry

Meshlab provides a nice visualisation of the point cloud out of the box, including the colour of each point and shading by estimated normal, as below.

Meshlab visualisation of exported point cloud

Meshlab provides a wide range of further processing tools, such as surface reconstruction. I also tried FreeCAD and Blender. Both imported and displayed the point cloud but I couldn’t easily tune the visualisation to look as good as above.

Next steps

I’d like to try some more videos, and explore how to better avoid noise artefacts in renders.

Throwback Thursday

The metaverse is a topic currently, though the concept has a long history. Twenty years ago, in the dotcom era, I was exploring this space, as I was recently reminded. Feeling nostalgic, I dug these projects out of the NAS archives. Tech has moved on, but there’s enduring relevance in what I learned.

VO2max (1999)

Virtual Online Orienteering, to the max! Conceived as similar to Catching Features, but a game that was online by default, as you could play in the browser in a virtual world authored in VRML97.

A screenshot of a virtual orienteering game showing a first person view of terrain, a start banner and a compass, with a list of checkpoints underneath.

The UI consisted of two browser windows: a first person view and motion controls (using the now defunct Cosmo Player), and a map in a second window. 

A screenshot of a virtual orienteering game showing a first person view of terrain, a boulder with a checkpoint next to it, and a compass, with a list of checkpoints underneath.

The draggable compass needle, the checkpoints and the course logic (must visit checkpoints in order), and a widget that visualised your completed route as an electric blue string hovering 3 feet above the ground were all modular VRML Protos.

A screenshot of a virtual orienteering game showing a first person view of terrain, the time to complete the course, and a blue line that shows the route taken to complete the course.

The map and terrain for the only level ever created were generated together with a custom C++ application. I was pretty pleased this all worked, and it demonstrated some concepts for…

4DUniverse (2000)

4DUniverse was a broad concept for virtual online worlds for socialising, shopping, gaming, etc, similar at the time to ActiveWorlds (which still exists today!), but again accessible through the browser (assuming you had a VRML plugin).

I thought I’d have great screengrabs to illustrate this part of the story, but I was surprised how few I’d captured, that they were very low resolution, and that they were in archaic formats. The source artefacts from this post – WRL, HTML, JS, JAVA, etc – have lost no fidelity, but would only meet modern standards and interpreters to various degrees. Maybe I will modernise them someday and generate new images to do justice to the splendour I held in my mind!

A variety of screengrabs of virtual worlds from the 4DUniverse. Three different worlds with different themes are shown. Vaguely Greek open hotel lobby, Gothic cathedral in orbit, etc

We authored a number of worlds, connected by teleports, with the tools we had to hand, being text editors, spreadsheets, and custom scripts. While a lot of fun, we came to the conclusion that doing the things we envisaged in the 4DUniverse wasn’t any more compelling than doing them in the 2D interfaces of the time. VRML eventually went away, and probably because no one else was able to make a compelling case for its use. At least I crafted a neat animated GIF logo rendered with POV-Ray.

4 capital D's with a metallic finish joined in a flower shape that is rotating

Less multiverse, and more quantum realm, I also generated VRML content at nanoscale with…

NanoCAD (2001)

NanoCAD was a neat little (pun intended) CAD application for designing molecules, which I extended with a richer editing UI, supporting the design of much more complex hypothesised molecular mechanisms.

NanoCAD UI showing 3D view of construction of a large buckytube from graphene sheets. Application controls and help text are shown in the lower panel.

The Java app allowed users to place atoms in 3D and connect them with covalent bonds. Then an energy solver would attempt to find a stable configuration for the molecules (using classical rather than quantum methods). With expressive selection, duplication and transformation mechanics, it was possible to create benzene rings, stitch them into graphene sheets, and roll them up into “enormous” buckytubes, or other complex carbon creations.

NanoCAD UI showing detailed 3D view of construction of a large buckytube from graphene sheets. Application controls and help text are shown in the lower panel. Further desktop background including MS Word, Windows Explorer and WinAmp evoke the 2001 era

I also created cables housed inside sheaths, gears – built with benzene ring teeth attached to buckytubes – and other micro devices. If 4DUniverse was inspired by Snow Crash, NanoCAD was inspired by The Diamond Age. Nanocad could run in a browser as an applet and the molecules could also be exported as WRL files for display in other viewers.

3D visualisation in the space-filling style of a molecular gear made from benzene ring teeth attached to the outside of a buckytube

Comparing contemporary professional projects

It’s nice to contrast the impermanence of these personal projects with the durability of my contemporary professional work with ANCA Machines. At the time, I was documenting the maths and code of Cimulator3D and also developing the maths and bidirectional UI design for iFlute, both used in the design and manufacturing of machine tools via grinding processes. Both products are still on the market in similar form today, more than two decades later.

I wonder how I’ll view this post in another twenty years?

Synthesising Semantle Solvers

Picking up threads from previous posts on solving Semantle word puzzles with machine learning, we’re ready to explore how different solvers might play along with people while playing the game online. Maybe you’d like to play speed Semantle against an artificially intelligent opponent, maybe you’d like a left-of-field hint on a tricky puzzle, or maybe it’s just fun to spectate at a cerebral robot battle.

Animation of a Semantle game from initial guess to completion

Substitute semantics

The solvers have a view of how words relate due to a similarity model that is encapsulated for ease of change. To date, we’ve used the same model as live Semantle, which is word2vec. But as this might be considered cheating, we can now also use a model based on the Universal Sentence Encoder (USE), to explore how the solvers perform with separated semantics.

Solver spec

To recap, the key elements of the solver ecosystem are now:

  • SimilarityModel – choice of word2vec or USE as above,
  • Solver methods (common to both gradient and cohort variants):
    • make_guess() – return a guess that is based on the solver’s current state, but don’t change the solver’s state,
    • merge_guess(guess, score) – update the solver’s state with information about a guess and a score,
  • Scoring of guesses by either the simulator or a Semantle game, where a game could also include guesses from other players.
Diagram illustrating elements of the solver ecosystem. Similarity model initialises solver state used to make guesses, which are scored by game and update solver state with scores. Other players can make guesses which also get scored

It’s a simplified reinforcement learning setup. Different combinations of these elements allow us to explore different scenarios.

Solver suggestions

Let’s look at how solvers might play with people. The base scenario friends is the actual history of a game played with people, completed in 109 guesses.

Word2Vec similarity

Solvers could complete a puzzle from an initial sequence of guesses from friends. Both solvers in this particular configuration generally easily better the friends result when primed with the first 10 friend guesses.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Cohort finishes with fewest guesses, then gradient, then friends, with clear separation.

Solvers could instead make the next guess only, but based on the game history up to that point. Both solvers may permit a finish in slightly fewer guesses. The conclusion is that these solvers are good for hints, especially if they are followed!

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Cohort finishes with fewest guesses, then gradient, then friends, with marginal differences.

Maybe these solvers using word2vec similarity do have an unfair advantage though – how do they perform with a different similarity model? Using USE instead, I expected the cohort solver to be more robust than the gradient solver…

USE similarity

… but it seems that the gradient descent solver is more robust to a disparate similarity model, as one example of the completion scenario shows.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Gradient finishes with fewest guesses, then friends, then cohort, and the separation is clear.

The gradient solver also generally offers some benefit making a suggestion for just the next guess, but the cohort solver’s contribution is marginal at best.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Gradient finishes with fewest guesses, then friends, and cohort doesn't finish, but the differences are very minor.

These are of course only single instances of each scenario, and there is significant variation between runs. It’s been interesting to see this play out interactively, but a more comprehensive performance characterisation – with plenty of scope for understanding the influence of hyperparameters – may be in order.

Solver solo

The solvers can also play part or whole games solo (or with other players) in a live environment, using Selenium WebDriver to submit guesses and collect scores. The leading animation above is gradient-USE and a below is a faster game using cohort-word2vec.

Animation of a Semantle game from initial guess to completion

So long

And that’s it for now! We have multiple solver configurations that can play online by themselves or with other people. They demonstrate how people and machines can collaborate to each bring their own strengths to solving problems; people with creative strategies and machines with a relentless ability to crunch through possibilities. They don’t spoil the fun of solving Semantle yourself or with friends, but they do provide new ways to play and to gain insight into how to improve your own game.

Postscript: seeing in space

Through all this I’ve considered various 3D visualisations of search through a semantic space with hundreds of dimensions. I’ve settled on the version below, illustrating a search for target “habitat” from first guess “megawatt”.

An animated rotating 3D view of an semi-regular collection of points joined by lines into a sequence. Some points are labelled with words. Represents high-dimensional semantic search in 3D.

This visualisation format uses cylindrical coordinates, broken out in the figure below. The cylinder (x) axis is the projection of each guess to the line that connects the first guess to the target word. The cylindrical radius is the distance of each guess in embedding space from its projection on this line (cosine similarity seemed smoother than Euclidian distance here). The angle of rotation in cylindrical coordinates (theta) is the cumulative angle between the directions connecting guess n-1 to n and n to n+1. The result is an irregular helix expanding then contracting, all while twisting around the axis from first to lass guess.

Three line charts on a row, with common x-axis of guess number, showing semi-regular lines, representing the cylindrical coordinates of the 3D visualisation. The left chart is x-axis, increasing from 0 to 1, middle is radius, from 0 to ~1 and back to 0, and right is angle theta, increasing from 0 to ~11 radians.

Second Semantle Solver

In the post Sketching Semantle Solvers, I introduced two methods for solving Semantle word puzzles, but I only wrote up one. The second solver here is based the idea that the target word should appear in the intersection between the cohorts of possible targets generated by each guess.

Finding the semantle target through overlapping cohorts. Shows two intersecting rings of candidate words based on cosine similarity.

To recap, the first post:

  • introduced the sibling strategies side-by-side,
  • discussed designing for sympathetic sequences, so the solver can play along with humans, with somewhat explainable guesses, and
  • shared the source code and visualisations for the gradient descent solver.

Solution source

This post shares the source for the intersecting cohorts solver, including notebook, similarity model and solver class.

The solver is tested against the simple simulator for semantle scores from last time. Note that the word2vec model data for the simulator (and agent) is available at this word2vec download location.

Stylised visualisation of the search for a target word with intersecting  cohorts. Shows distributions of belief strength at each guess and strength and rank of target word

The solver has the following major features:

  1. A vocabulary, containing all the words that can be guessed,
  2. A semantic model, from which the agent can calculate the similarity of word pairs,
  3. The ability to generate cohorts of words from the vocabulary that are similar (in Semantle score) to a provided word (a guess), and
  4. An evolving strength of belief that each word in the vocabulary is the target.

In each step towards guessing the target, the solver does the following:

  1. Choose a word for the guess. The current choice is the word with the strongest likelihood of being the target, but it could equally be any other word from the solver’s vocabulary (which might help triangulate better), or it could be provided by a human player with their own suspicions.
  2. Score the guess. The Semantle simulator scores the guess.
  3. Generate a cohort. The guess and the score are used to generate a new cohort of words that would share the same score with the guess.
  4. Merge the cohort into the agent’s belief model. The score is added to the current belief strength for each word in the cohort, providing a proxy for likelihood for each word. The guess is also masked from further consideration.

Show of strength

The chart below shows how the belief strength (estimated likelihood) of the target word gradually approaches the maximum belief strength of any word, as the target (which remains unknown until the end) appears in more and more cohorts.

Intersecting cohorts solver. Line chart showing the belief strength of the target word at each guess in relation to the maximum belief strength of remaining words.

We can also visualise the belief strength across the whole vocabulary at each guess, and the path the target word takes in relation to these distributions, in terms of its absolute score and its rank relative to other words.

Chart showing the cohort solver belief strength across the whole vocabulary at each guess, and the path the target word takes in relation to these distributions, in terms of its absolute score and its rank relative to other words

Superior solution?

The cohort solver can be (de)tuned to almost any level of performance by adjusting the parameters precision and recall, which determine the tightness of the similarity band and completeness of results from the generated cohorts. The gradient descent solver has potential for tuning parameters, but I didn’t explore this much. To compare the two, we’d therefore need to consider configurations of each solver. For now, I’m pleased that the two distinct sketches solve to my satisfaction!