Smarter Semantle Solvers

A little smarter, anyway. I didn’t expect to pick this up again, but when I occasionally run the first generation solvers online, I’m often equal parts amused and frustrated by rare words thrown up that delay the solution – from amethystine to zigging.

Animation of online semantle solution
An example solution with fewer than typical rare words guessed

The solvers used the first idea that worked; can we make some tweaks to make them smarter? The code is now migrated to its own new repo after outgrowing its old home.

Measuring smarts

I measure solver performance by running multiple trials of a solver configuration against the simulator for a variety of target words. This gives a picture of how often the solver typically succeeds within a certain number of guesses.

Chart showing cumulative distribution function curves for two solver configurations

Vocabulary

It turns out that the vocabulary to date based on english_words_set is a poor match for the most frequently used English words, according to unigram frequency data.

So we might expect that simply replacing the solver vocabulary would improve performance, and we also get word ranking from unigram_freq.

Semantic models

We’ll continue with Universal Sentence Encoder (USE) to ensure search strategies are robust to different semantic models.

Search

To improve the gradient solver I tried making another random guess every so often to avoid long stretches exploring local minima. But it didn’t make things better, and probably made them worse!

In response, I made each guess the most common local word to the extrapolated semantic location, rather than just the nearest word. Still no better, and trying both “improvements” together was significantly worse!

Ah well, experiments only fail if we fail to learn from them!

Vocabulary again

I think the noise inherent in a different semantic model, plus the existing random extrapolation distance, overwhelms the changes I tried. In better news, we see a major improvement from using unigram freq vocabulary, reducing the mean from 280 (with many searches capped at 500) to 198, approximately a 30% improvement.

Smarter still?

Here we see that the data-centric (vocabulary) improvement had a far bigger impact than any model-centric (search algorithm) improvement that I had the patience to try (though I left a bunch of further todos). Maybe just guessing randomly from the top n words will be better again! ????

At least I’ve made a substantial dent in reducing those all-too-common guesses at rare words.

22 rules of generative AI

Thinking about adopting, incorporating or building generative AI products? Here are some things to think about, depending on your role or roles.

I assume you’re bringing your own application based on an understanding of an opportunity or problem that involves creating, combining or transforming some kind of digital content. If you don’t have that understanding of a customer problem and why other solutions are not suitable, go and get it! Digital content may mean text, code, images, sound, video, 3D, etc, for digital consumption, or it may mean digitized designs for real world products or services such as code (again), recipes, blueprints, etc. Some of this may also be relevant for how you use other people’s generative AI tools in your own work.

Product strategy and management roles

1. Know what input you have to an AI product or feature that’s difficult to replicate. This is generally proprietary data, but it may be an algorithm tuned in-house, access to compute resources, or a particularly responsive deployment process, etc. This separates competitive differentiators from competitive parity features.

2. Interrogate the role of data. Do you need historical data to start, can you generate what you need through experimentation, or can you leverage your proprietary data with open source data, modelling techniques or SaaS products? Work with you technical leads to understand the multitude of mathematical and ML techniques available to ensure data adds the most value for the least effort.

3. Understand where to use open source or Commercial Off-The-Shelf (COTS) software for parity features, but also understand the risks of COTS including roadmaps, implementation, operations and data.

4. Recognise that functional performance of AI features is uncertain at the outset and variable in operation, which creates delivery risk. Address this by: creating a safe experimentation environment, supporting dual discovery (creating knowledge) and development (creating software) tracks with a continuous delivery approach, and – perhaps the hardest part – actually responding to change.

Design roles

5. Design for failure, and loss of vigilance in the face of rare failures. Failure can mean outputs that are nonsensical, fabricated, incorrect, or – depending on scope and training data – harmful.

6. Learn the affordances of AI technologies so you understand how to incorporate them into user experiences, and can effectively communicate their function to your users.

7. Study various emerging UX patterns. My quick take: generative AI may be used as a discrete tool with (considering #5) predictable results for the user, such as replacing the background in a photo, it may be used as a collaborator, reliant on a dialogue or back-and-forth iterative design or search process between the user and AI, such as ChatGPT, or it may be used as an author, producing a nearly finished work that the user then edits to their satisfaction (which comes with risk of subtle undetected errors).

8. Consider what role the AI is playing in the collaborator pattern – is it designer, builder, tester, or will the user decide? There is value in generating novel options to explore as a designer, in expediting complex workflows as a builder, and in verifying or validating solutions to some level of fidelity as a tester. However, for testing, remember you can not inspect quality into a product, and consider building in quality from the start.

9. Design for explainability, to help users understand how their actions influence the output. (This overlaps heavily with #6)

10. More and more stakeholders will want to know what goes into their AI products. If you haven’t already, start on your labelling scheme for AI features, which may include: intended use, data ingredients and production process, warnings, reporting process, and so on, with reference to risk and governance below.

Data science and data engineering roles

11. Work in short cycles in multidisciplinary product teams to address end-to-end delivery risks.

12. Quantify the functional performance of systems, the satisfaction of guardrails, and confidence in these measures for to support product decisions.

13. Make it technically easy and safe to work with and combine rich data.

14. Implement and automate a data governance model that enables delivery of data products and AI features to the support business strategy (i.e., a governance model that captures the concerns of other rules and stakeholders here).

Architecture and software engineering roles

15. Understand that each AI solution is narrow, but composable with other digital services. In this respect, treat each AI solution as a distinct service until a compelling case is made for consolidation. (Note that, as above, product management should be aware of how to make use of existing solutions.)

16. Consolidate AI platform services at the right level of abstraction. The implementation of AI services may be somewhat consistent, or it may be completely idiosyncratic depending on the solution requirements and available techniques. The right level of abstraction may be emergent and big up-front design may be risky.

17. Use continuous delivery for short feedback cycles and delivery that is both iterative – to reduce risk from knowledge gaps – and responsive – to reduce the risk of a changing world.

18. Continuous delivery necessitates a robust testing and monitoring strategy. Make use of test pyramids for both code and data for economical and timely quality assurance.

Risk and governance roles

19. Privacy and data security are the foundation on which everything else is built.

20. Generative AI solutions, like other AI solutions, may also perpetuate harmful content, biases or correlations in their historical training data.

21. Understand that current generative AI solutions may be subject to some or all of the following legal and ethical issues, depending on their source data, training and deployment as a service: privacy, copyright or other violation regarding collection of training data, outputs that plagiarise or create “digital forgeries” of training data, whether the aggregation and intermediation of individual creators at scale is monopoly behaviour and whether original creators should be compensated, that training data may include harmful content (which may be replicated into harmful outputs), that people may have been exposed to harmful content in a moderation process, and that storing data and the compute for training and inference may have substantial environmental costs.

22. Develop strategies to address the further structural failure modes of AI solutions, such as: misalignment with user goals, deployment into ethically unsound applications, the issue of illusory progress where small gains may look promising but never cross the required threshold, the magnification of rare failures at scale and the resolution of any liability for those failures.

Conclusion

These are the type of role-based considerations I alluded to in Reasoning About Machine Creativity. The list is far from complete, and the reader would doubtless benefit from sources and references! I intended to write this post in one shot, which I did in 90 minutes while hitting the target 22 rules without significant editing, so I will return after some reflection. Let me know if these considerations are helpful in your roles.

Reasoning About Machine Creativity

With the current interest in generative AI, I wanted to write a short post updating the framing I took in my older talk Reasoning About Machine Intuition (2017), which was intended for broad audiences to understand the impact and best application of AI solutions from multiple digital delivery perspectives.

Bicycles and automobiles share some features and are used for many of the same tasks, but have important differences that must be considered by transport planners. Recently, electric bikes have created another distinct mobility category that nonetheless shares some elements with existing categories. So it is with AI solutions. While AI may share some features of human intelligence and be suitable for some of the same tasks, understanding the differences is crucial for digital professionals to be able to reason about their capabilities and applicability.

Skip right ahead to machine creativity if you want! Also listen to my podcast on Creative AI for another perspective.

Machine intuition

Considering that products and features introduced in the ML boom of the late 2010s allowed sufficiently good decisions to be made on complex data without precisely specified rules (e.g., image classification), I chose to characterise these solutions as “machine intuition”, in order to highlight that their narrow artificial intelligences were most comparable to human intuition. However, important differences remain. And of course I used “reasoning” in the title to highlight the capability of human intelligence that wasn’t present in these solutions.

Diagram illustrating a large number of emojis feeding into a decision node

Similarities to human intuition

Opportunities, tasks or problems amenable to both approaches share these characteristics:

  • Good decisions will be made, based on ambiguous inputs, but mistakes will also be made,
  • The approach is useful if solutions make enough good decisions in aggregate for a given context, and the volume and nature of mistakes is tolerable,
  • The decisions may have limited explainability, even if explainability important,
  • The decisions are based on past experience and therefore subject to bias.

(NB there are many examples of particularly egregious, discriminatory and harmful mistakes that were not detected or considered prior to release of AI solutions, and that the understanding of what constitutes a mistake, in addition to whether the decision itself is structurally discriminatory, must consider many ethical dimensions.)

Differences from human intuition

If a machine intuition approach looks suitable based on the characteristics above, we must also consider the differences below:

  • The artificial intelligence remains narrow – it can only perform one specific task and only to the degree permitted by its training data. This is different to a human, who can easily generalise to a related task or accommodate new data. However, the same or similar data may be sliced multiple ways to support multiple related narrow tasks, and individual solutions are composable – maybe embarrassingly so – and composable with other digital services, all of which may substitute as a limited form of generality.
  • Machine intuition requires vastly more training instances (many more even than any human expert might see in a lifetime) and concomitantly more computing power than human intuition. NB. These training instances also must be presented in a specific format and are also typically labelled by humans! In contrast, human intuition may only need a handful of examples, and can fall back on reasoning or inference from related experience if direct intuition fails (generalisation again). However, machines may be trained on a volume of data that no human could consume, and any trained model can be reproduced and deployed almost infinitely, so at some scale, low variable cost may compensate for high fixed cost.
  • Machine intuition is possible at superhuman scales, in particular volume of data or requests, and speed of inference. For instance, translating all of Wikipedia in fractions of a second. Machine intuition may also exceed functional human performance at the relevant task, though effective measurement of this must carefully consider the task definition and potential for bias.
  • Machine intuition will fail in some proportion of predictions as a matter of course (though we assume this is manageable) and is also subject to weird/trivial (adversarial) failure modes, such as changing a single pixel, that humans are generally robust to. Mistakes at scale from a single centralised ML solution may also be less acceptable than the aggregate mistakes made by many independent humans.

Anyone involved in delivery of AI solutions should keep these basic factors in mind in order to reason about product and engineering concerns. There is more to consider, but this is a good starting point.

Machine creativity

Considering the current generative AI boom, I think of these solutions as “machine creativity” in order to highlight that their narrow artificial intelligences are most comparable to human creativity in a given medium. However, important differences remain.

Diagram illustrating a single creative spark generating a large number of emoji

Creativity for our purposes is taking some simple input and creating a complex output from the input, an output that also incorporates other ideas, knowledge and techniques beyond the input. That output may be almost any form of digital content, from natural language text, to code, to images, to music, to movies, to 3D scenes, to animated 3D movies. AI that is embodied or with access to manufacturing may also exhibit creativity in the real world, through the materialisation of digital designs.

Some applications of generative AI look more like search, databases, or even back-ends, but they are like our creative reference in that they produce complex outputs from simple inputs, and by similar mechanisms.

(NB legal and ethical issues remain to be resolved with respect to some current mechanisms available to machine creativity to incorporate external ideas, knowledge and techniques. These include: copyright and potential for plagiarism, safety of input and output content and safety of human moderators, attribution and compensation for original creators, and so on.)

Similarities to human creativity

Opportunities, tasks or problems amenable to both approaches share these characteristics:

  • There is not a single “right” answer, multiple answers will suffice and may even be desirable to generate valuable options to pursue,
  • Assessing the goodness of the outputs may include some degree of subjectivity,
  • There may be surprising or non-obvious elements in the output, and again this may be desirable, or risky, or both,
  • The process is likely iterative, with multiple rounds of review and editing.

Differences from human creativity include

If a machine creativity approach looks suitable based on an application being sympathetic to the characteristics above, we must also consider the differences below:

  • The artificial intelligence has no agency or intent in its creativity, it simply processes inputs to generate outputs that are likely or typical based on its training data, described as “next token prediction” (where a token is an element of text, or patch of an image, etc). This may also appear as misalignment or the generation of unsafe content, which can be difficult to detect or control currently.
  • The artificial intelligence has no logically consistent model of the world. The outputs it generates have a high probability of following the prompt, but are not necessarily logically consistent with the prompt or even internally consistent, which can lead to articulate but nonsensical, incorrect or harmful answers. (i.e., It’s also missing the reasoning which is absent from intuition.)
  • The artificial intelligence remains narrow. It performs one generative task but it does not subject the output to a reasoned review or critique, as might be performed by a human to detect error. However, it is again composable, and tests could be applied after the generative step, though these too are fallible. There are many examples of creative AI tool-chains being shared by human creators to support complex creative workflows.
  • Machine creativity also requires more training instances, but is similarly almost infinitely reproducible for creating outputs. Leveraging current tools which include third party training data, it is important to understand the provenance of those training instances – whether they were used with permission, whether they were curated in an ethical manner, and so on.
  • There is by default no explicit attribution of influences on an output, although this is an area of focus and may be improved directly in creative systems or by hybrid means.
  • Machine creativity is also possible at superhuman scales of speed and volume
  • Machine creativity is also subject to weird/trivial adversarial attacks, such as prompt injection

Conclusion

As I’ve been guided by the set of machine intuition considerations above for a number of years, this is the initial set of considerations that I will take forward when considering applications for machine creativity, though I will continue to review their relevance in light of future developments.

In future, I’d like to address out these considerations more specifically by the various roles in a digital delivery organisation, as per the original talk.

Nerfing along

NeRFs provide many benefits for 3D content: the rendering looks natural while the implementation is flexible. So I wanted to get hands on, and build myself a NeRF. I wanted to understand what’s possible to reproduce in 3D from just a spontaneous video capture. I chose a handheld holiday video from an old iPhone X while cycling on beautiful Maria Island.

Video taken while cycling on Maria Island

The camera moves along a fairly straight path, pointing a little right of the direction of travel. This contrasts with NeRFs or scans of objects, where the camera may do one or more full orbits of the object to get every perspective and thus produce seamless renders and clean models. I expect 3D generated from the video above will be missing some detail.

My aim was to build a NeRF from the video, render alternative camera paths, explore the generated geometry, and understand the application and limitation of the results. Here’s the view from one alternative camera path, which follows the original path at first, and then swings out to the side.

Alternative camera path rendered from NeRF

Worfklow overview

I used nerfstudio via their Colab notebook running on Colab Pro with GPU to render the final and intermediate products. The table below lists the major stages, tools and products.

StageToolProduct
Process video datans-process-data video via COLMAPImages of each frame (png) with inferred camera poses (json)
Train NeRFns-train nerfactoNeRF configuration data including final model checkpoint (ckpt)
Define camera pathsnerfstudio viewerCamera path definition based on keyframes (json)
Render videosns-renderNovel video of the NeRF scene (mp4)
Export geometryns-export pointcloudPoint cloud with surface colour and estimated normals (ply)
Consume geometryMeshlabVisualised pointcloud
nerfstudio workflow overview

For reference, I consumed about 3 Colab Pro “compute units” with one end-to-end train and render (6s 480p 60fps video), but including running the install steps (for transient runtimes) and doing multiple renders on different paths has consumed about 6 “compute units” per NeRF.

Workflow details

Here’s a more detailed walkthrough. There are lots of opportunities to improve.

Process video data

This stage produces a set of images from the video, corresponding to each requested frame, and uses COLMAP to infer the pose of each image. The video was 480p and 6s at 60fps. This processed data is suitable for training a NeRF. The result is visualised below in the nerfstudio viewer.

Posed video frames

I used the `sequential option for video but haven’t evaluated any speedup. I’m not having much luck with specifying the number of frames via the command line parameter either. The resultant files could be zipped and stored outside the Colab instance (locally or on Drive) for direct input to the training stage.

Train NeRF

The magic happens here. The nerfstudio viewer provides live exploration of the radiance field as it is progressively refined through training. The landscape was recognisable very early on in the training process and it was hard to discern improvements in the later stages (at least when using the viewer interactively).

The trained model can also be zipped and stored outside the Colab instance for direct input into later stages.

Define camera paths

I defined one camera path to initially follow the camera’s original trajectory and then deviate significantly to show alternative perspectives and test the limits of scene reconstruction. This path is shown below.

Deviating camera path

I also defined a second path that reversed the original camera trajectory. I downloaded these camera paths for reuse.

Render videos

Rendering the deviating path (video above), the originally visible details are recreated quite convincingly. Noise is visible when originally hidden details are exposed, and also generally around the edges of the frame. I would like to try videos from cameras with a wider field of view to see how much more of the scene they capture.

The second, reversed, path (below) also faithfully reconstructs visible objects, but with some loss of fidelity due to the reversed camera position, and displays more of noise outside the known scene.

Reversed camera path rendered from NeRF

Export geometry

I ran ns-export pointcloud and chose to add estimated normals to the export. I downloaded the ply file to work with it locally.

Consume geometry

Meshlab provides a nice visualisation of the point cloud out of the box, including the colour of each point and shading by estimated normal, as below.

Meshlab visualisation of exported point cloud

Meshlab provides a wide range of further processing tools, such as surface reconstruction. I also tried FreeCAD and Blender. Both imported and displayed the point cloud but I couldn’t easily tune the visualisation to look as good as above.

Next steps

I’d like to try some more videos, and explore how to better avoid noise artefacts in renders.

Synthesising Semantle Solvers

Picking up threads from previous posts on solving Semantle word puzzles with machine learning, we’re ready to explore how different solvers might play along with people while playing the game online. Maybe you’d like to play speed Semantle against an artificially intelligent opponent, maybe you’d like a left-of-field hint on a tricky puzzle, or maybe it’s just fun to spectate at a cerebral robot battle.

Animation of a Semantle game from initial guess to completion

Substitute semantics

The solvers have a view of how words relate due to a similarity model that is encapsulated for ease of change. To date, we’ve used the same model as live Semantle, which is word2vec. But as this might be considered cheating, we can now also use a model based on the Universal Sentence Encoder (USE), to explore how the solvers perform with separated semantics.

Solver spec

To recap, the key elements of the solver ecosystem are now:

  • SimilarityModel – choice of word2vec or USE as above,
  • Solver methods (common to both gradient and cohort variants):
    • make_guess() – return a guess that is based on the solver’s current state, but don’t change the solver’s state,
    • merge_guess(guess, score) – update the solver’s state with information about a guess and a score,
  • Scoring of guesses by either the simulator or a Semantle game, where a game could also include guesses from other players.
Diagram illustrating elements of the solver ecosystem. Similarity model initialises solver state used to make guesses, which are scored by game and update solver state with scores. Other players can make guesses which also get scored

It’s a simplified reinforcement learning setup. Different combinations of these elements allow us to explore different scenarios.

Solver suggestions

Let’s look at how solvers might play with people. The base scenario friends is the actual history of a game played with people, completed in 109 guesses.

Word2Vec similarity

Solvers could complete a puzzle from an initial sequence of guesses from friends. Both solvers in this particular configuration generally easily better the friends result when primed with the first 10 friend guesses.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Cohort finishes with fewest guesses, then gradient, then friends, with clear separation.

Solvers could instead make the next guess only, but based on the game history up to that point. Both solvers may permit a finish in slightly fewer guesses. The conclusion is that these solvers are good for hints, especially if they are followed!

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Cohort finishes with fewest guesses, then gradient, then friends, with marginal differences.

Maybe these solvers using word2vec similarity do have an unfair advantage though – how do they perform with a different similarity model? Using USE instead, I expected the cohort solver to be more robust than the gradient solver…

USE similarity

… but it seems that the gradient descent solver is more robust to a disparate similarity model, as one example of the completion scenario shows.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Gradient finishes with fewest guesses, then friends, then cohort, and the separation is clear.

The gradient solver also generally offers some benefit making a suggestion for just the next guess, but the cohort solver’s contribution is marginal at best.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Gradient finishes with fewest guesses, then friends, and cohort doesn't finish, but the differences are very minor.

These are of course only single instances of each scenario, and there is significant variation between runs. It’s been interesting to see this play out interactively, but a more comprehensive performance characterisation – with plenty of scope for understanding the influence of hyperparameters – may be in order.

Solver solo

The solvers can also play part or whole games solo (or with other players) in a live environment, using Selenium WebDriver to submit guesses and collect scores. The leading animation above is gradient-USE and a below is a faster game using cohort-word2vec.

Animation of a Semantle game from initial guess to completion

So long

And that’s it for now! We have multiple solver configurations that can play online by themselves or with other people. They demonstrate how people and machines can collaborate to each bring their own strengths to solving problems; people with creative strategies and machines with a relentless ability to crunch through possibilities. They don’t spoil the fun of solving Semantle yourself or with friends, but they do provide new ways to play and to gain insight into how to improve your own game.

Postscript: seeing in space

Through all this I’ve considered various 3D visualisations of search through a semantic space with hundreds of dimensions. I’ve settled on the version below, illustrating a search for target “habitat” from first guess “megawatt”.

An animated rotating 3D view of an semi-regular collection of points joined by lines into a sequence. Some points are labelled with words. Represents high-dimensional semantic search in 3D.

This visualisation format uses cylindrical coordinates, broken out in the figure below. The cylinder (x) axis is the projection of each guess to the line that connects the first guess to the target word. The cylindrical radius is the distance of each guess in embedding space from its projection on this line (cosine similarity seemed smoother than Euclidian distance here). The angle of rotation in cylindrical coordinates (theta) is the cumulative angle between the directions connecting guess n-1 to n and n to n+1. The result is an irregular helix expanding then contracting, all while twisting around the axis from first to lass guess.

Three line charts on a row, with common x-axis of guess number, showing semi-regular lines, representing the cylindrical coordinates of the 3D visualisation. The left chart is x-axis, increasing from 0 to 1, middle is radius, from 0 to ~1 and back to 0, and right is angle theta, increasing from 0 to ~11 radians.

Second Semantle Solver

In the post Sketching Semantle Solvers, I introduced two methods for solving Semantle word puzzles, but I only wrote up one. The second solver here is based the idea that the target word should appear in the intersection between the cohorts of possible targets generated by each guess.

Finding the semantle target through overlapping cohorts. Shows two intersecting rings of candidate words based on cosine similarity.

To recap, the first post:

  • introduced the sibling strategies side-by-side,
  • discussed designing for sympathetic sequences, so the solver can play along with humans, with somewhat explainable guesses, and
  • shared the source code and visualisations for the gradient descent solver.

Solution source

This post shares the source for the intersecting cohorts solver, including notebook, similarity model and solver class.

The solver is tested against the simple simulator for semantle scores from last time. Note that the word2vec model data for the simulator (and agent) is available at this word2vec download location.

Stylised visualisation of the search for a target word with intersecting  cohorts. Shows distributions of belief strength at each guess and strength and rank of target word

The solver has the following major features:

  1. A vocabulary, containing all the words that can be guessed,
  2. A semantic model, from which the agent can calculate the similarity of word pairs,
  3. The ability to generate cohorts of words from the vocabulary that are similar (in Semantle score) to a provided word (a guess), and
  4. An evolving strength of belief that each word in the vocabulary is the target.

In each step towards guessing the target, the solver does the following:

  1. Choose a word for the guess. The current choice is the word with the strongest likelihood of being the target, but it could equally be any other word from the solver’s vocabulary (which might help triangulate better), or it could be provided by a human player with their own suspicions.
  2. Score the guess. The Semantle simulator scores the guess.
  3. Generate a cohort. The guess and the score are used to generate a new cohort of words that would share the same score with the guess.
  4. Merge the cohort into the agent’s belief model. The score is added to the current belief strength for each word in the cohort, providing a proxy for likelihood for each word. The guess is also masked from further consideration.

Show of strength

The chart below shows how the belief strength (estimated likelihood) of the target word gradually approaches the maximum belief strength of any word, as the target (which remains unknown until the end) appears in more and more cohorts.

Intersecting cohorts solver. Line chart showing the belief strength of the target word at each guess in relation to the maximum belief strength of remaining words.

We can also visualise the belief strength across the whole vocabulary at each guess, and the path the target word takes in relation to these distributions, in terms of its absolute score and its rank relative to other words.

Chart showing the cohort solver belief strength across the whole vocabulary at each guess, and the path the target word takes in relation to these distributions, in terms of its absolute score and its rank relative to other words

Superior solution?

The cohort solver can be (de)tuned to almost any level of performance by adjusting the parameters precision and recall, which determine the tightness of the similarity band and completeness of results from the generated cohorts. The gradient descent solver has potential for tuning parameters, but I didn’t explore this much. To compare the two, we’d therefore need to consider configurations of each solver. For now, I’m pleased that the two distinct sketches solve to my satisfaction!

Creative AI

I recently talked with Leon Gettler on an episode of the Talking Business podcast about Creative AI – paring people with AI to augment product and strategy development.

This connects with some themes I’ve blogged about here before, such as No Smooth Path to Good Design and Leave Product Development to the Dummies. Also, Sketching Semantle Solvers explores how machines might generate and test new ideas in a game scenario in a way that’s sympathetic to human players.

Sketching Semantle Solvers

Semantle is an online puzzle game in which you make a series of guesses to discover a secret word. Each guess is scored by how “near” it is to the secret target, providing guidance for subsequent guesses, but that’s all the help you get. Fewer guesses is a better result, but hard to achieve, as the majority of words are not “near” and there are many different ways to get nearer to the target.

You could spend many enjoyable hours trying to solve a puzzle like this, or you could devote that time to puzzling over how a machine might solve it for you…

Scoring system

Awareness of how the nearness score is calculated can inspire potential solutions. The score is based on a machine learning model of language; how frequently words appear in similar contexts. These models convert each word into a unique point in space (also known as an embedding) in such a way that similar words are literally near to one another in this space, and therefore the similarity score is higher for points that are nearer one another.

Diagram of a basic semantic embedding example. The words "dog" and "cat" are shown close together, while the word "antidisestablishmentariansim" is shown distant from both.

We can reproduce this similarity score ourselves with a list of English words and a trained machine learning model, even though these models use 100s of dimensions rather than two, as above. Semantle uses the word2vec model but there are also alternatives like USE. Comparing these results to the scores from a Semantle session could guide a machine’s guesses. We might consider this roughly equivalent to our own mental model of the nearness of any pair of words, which we could estimate if we were asked.

Sibling strategies

Two general solution strategies occurred to me to find the target word guided by similarity scores: intersecting cohorts and gradient descent.

Intersecting cohorts: the score for each guess defines a group of candidate words that could be the target (because they have the same similarity with the guessed word as the score calculated from the target). By making different guesses, we get different target cohorts with some common words. These cohort intersections allow us to narrow in on the words most like to be the target, and eventually guess it correctly.

Diagram showing two similarity cohorts. These form halos around the axis of guess direction, based on dot product similarity, and intersect in the direction of the target word.

Gradient descent: each guess gives a score, and we look at the difference between scores and where the guesses are located relative to each other to try to identify the “semantic direction” in which the score is improving most quickly. We make our next guess in that direction. This doesn’t always get us closer but eventually leads us to the target.

Diagram showing a number of nodes and gradient directions between nodes. One is highlighted showing the maximum gradient and direction of the next guess, which is a node close to the extension of the direction vector.

I think human players tend more towards gradient descent, especially when close to the target, but also use some form of intersecting cohorts to hypothesise potential directions when uncertain. For a machine, gradient descent requires locations in embedding space to be known, while intersecting cohorts only cares about similarity of pairs.

Sympathetic sequences

Semantle is open source and one could create a superhuman solver that takes unfair advantage of knowledge about the scoring system. For instance, 4 significant figures of similarity (as per semantle scores) allows for pretty tight cohorts. Additionally, perfectly recalling large cohorts of 10k similar words at each guess seems unrealistic for people.

I was aiming for something that produced results in roughly the same range as a human and that could also play alongside a human should they want a helpful suggestion. Based on limited experience, the human range seems to be – from exceptional to exasperated – about 20 to 200+ guesses.

This lead to some design intents:

  • that the solving agent capabilities were clearly separated from the Semantle scoring system (I would like to use a different semantic model for the agent in future)
  • that proposing the next guess and incorporating the results from a guess would be decoupled to allow the agent to play with others
  • that the agent capabilities could be [de]tuned if required to adjust performance relative to humans or make its behaviour more interpretable

Solution source

This post shares the source for the gradient descent solver and a simple simulator for semantle scores. Note that the word2vec model data for the simulator (and agent) is available at this word2vec download location.

I have also made a few iterations on the intersecting cohorts approach, which works but isn’t ready for publication.

Seeking the secret summit

The gradient descent (or ascent to a summit) approach works pretty well by just going to the most similar word and moving a random distance in the direction of the steepest known gradient. The nearest not previously guessed word to the resultant point is proposed as the next guess. You can see a gradual but irregular improvement in similarity as it searches.

Line chart of similarity score to target for each word in a sequence of guesses. The line moves upwards gradually but irregularly for most of the chart and shoots up at the end. The 46 guesses progress from thaw to gather.

In the embedding space, I overlaid a network (or graph) of “nodes” representing words and their similarity to the target, and “spokes” representing the direction between nodes and the gradient of similarity in that direction. This network is initialised with a handful of random guesses before the gradient descent begins in earnest. Below I’ve visualised the search in this space with respect to the basis – the top node and spoke with best gradient – of each guess.

Chart showing progession of basis of guessing the target word. The horizontal axis is current best guess. The vertical axis is current reference word. A line progresses in fewer hops horizontally and more hops vertically from bottom left to top right.

The best results are about 40 guesses and typically under 200, though may blow out on occasion. I haven’t really tried to optimise the search; as above, the first simple idea worked pretty well. To [de]tune or test the robustness of this solution, I’ve considered adding more noise to the search for the nearest word from the extrapolated point, or compromising the recall of nearby words, or substituting a different semantic model. These things might come in future. At this stage I just wanted to share a sketch of the solver rather than a settled solution.

Postscript: after publishing, I played with the search visualisation in an attempt to tell a more intuitive story (from literally to nobody).

Line chart showing the similarity of each of a sequence of 44 guesses to a semantle target. The line is quite irregular but trends up from first guess “literally” at bottom left to target “nobody” at top right. The chart is annotated with best guess at each stage and reference words for future guesses.

Stop the sibilants, s’il vous plaît

C’est suffit! I’m semantically sated. After that sublime string of subheadings, the seed of a supplementary Wordle spin-off sprouts: Alliteratle anyone?

Bridging the linguistic inclusion gap with AI

It was great to be able to reflect with colleagues on common themes running through Thoughtworks’ work in languages and technology. In various scenarios, with different technology approaches, we worked to improve the inclusiveness of solutions, pointing to a more linguistically inclusive future.

https://www.thoughtworks.com/insights/blog/machine-learning-and-ai/how-ai-could-bridge-the-linguistic-inclusion-gap

Project Slackpose

Another lockdown, another project for body and mind. Slackpose allows me to track my slackline walking and review my technique. Spending 5 minutes on the slackline between meetings is a great way to get away from my desk!

I had considered pose estimation for wheelies last year, but decided slackline walking was an easier start, and something the whole family could enjoy.

Setup

I mount my phone on a tripod at one end of the slackline and start recording a video. This gives a good view of side-to-side balance, and is most pixel-efficient in vertical orientation.

Alternatively, duct tape the phone to something handy, or even hand-hold or try other angles. The 2D pose estimation (location of body joints in the video image) will be somewhat invariant to the shooting location, but looking down the slackline (or shooting from another known point) may help reconstruct more 3D pose data using only a single camera view. Luckily, it’s also invariant to dogs wandering into frame!

I use OpenPose from the CMU Perceptual Computing Lab to capture pose data from the videos. See below for details of the keypoint pose data returned and some notes on setting up and running OpenPose.

I then analyse the keypoint pose data in a Jupyter notebook that you can run on Colab. This allows varied post-processing analyses such as the balance analysis below.

Keypoint Pose Data

OpenPose processes the video to return data on 25 body keypoints for each frame, representing the position of head, shoulders, knees, and toes, plus eyes, ears, and nose and other major joints (but mouth only if you explicitly request facial features).

These body keypoints are defined as (x, y) pixel locations in the video image, for one frame of video. We can trace the keypoints over multiple frames to understand the motion of parts of the body.

Keypoints also include a confidence measure [0, 1], which is pretty good for the majority of keypoints from the video above.

Balance Analysis

I wanted to look at balance first, using an estimate of my body’s centre of mass. I calculated this from the proportional mass of body segments (sourced here) with estimates of the location centre of mass for each segment relative to the pose keypoints (I just made these up, and you can see what I made up in the notebook).

This looks pretty good from a quick eyeball, although it’s apparent that it is sensitive to the quality of pose estimation of relatively massive body segments, such as my noggin, estimated at 8.26% of my body mass. When walking away, OpenPose returns very low confidence and a default position of (0, 0) for my nose for many frames, so I exclude it from the centre of mass calculation in those instances. You can see the effect of excluding my head in the video below.

Not much more to report at this point; I’ll have a better look at this soon, now that I’m up and walking with analysis.

OpenPose Notes

Setup

I ran OpenPose on my Mac, following the setup guide at https://github.com/CMU-Perceptual-Computing-Lab/openpose, and also referred to two tutorials. These instructions are collectively pretty good, but note:

  • 3rdparty/osx/install_deps.sh doesn’t exist, you will instead find it at scripts/osx/install_deps.sh
  • I had to manually pip install numpy and opencv for OpenPose with pip install 'numpy<1.17' and pip install 'opencv-python<4.3‘, but this is probably due to my neglected Python2 setup.
  • Homebrew cask installation syntax has changed from the docs; now invoked as brew install --cask cmake

Running

Shooting iPhone video, I need to convert format for input to OpenPose. Use ffmeg as below (replace slackline.mov/mp4 with the name of your video).

ffmpeg -i slackline.mov -vcodec h264 -acodec mp2 slackline.mp4

I then typically invoke OpenPose to process video and output marked up frames and JSON files with the supplied example executable, as below (again replace input video and output directory with your own):

./build/examples/openpose/openpose.bin --video slack_video/slackline.mp4 --write_images slack_video/output --write_json slack_video/output