## Second Semantle Solver

In the post Sketching Semantle Solvers, I introduced two methods for solving Semantle word puzzles, but I only wrote up one. The second solver here is based the idea that the target word should appear in the intersection between the cohorts of possible targets generated by each guess.

To recap, the first post:

• introduced the sibling strategies side-by-side,
• discussed designing for sympathetic sequences, so the solver can play along with humans, with somewhat explainable guesses, and
• shared the source code and visualisations for the gradient descent solver.

## Solution source

This post shares the source for the intersecting cohorts solver, including notebook, similarity model and solver class.

The solver is tested against the simple simulator for semantle scores from last time. Note that the word2vec model data for the simulator (and agent) is available at this word2vec download location.

The solver has the following major features:

1. A vocabulary, containing all the words that can be guessed,
2. A semantic model, from which the agent can calculate the similarity of word pairs,
3. The ability to generate cohorts of words from the vocabulary that are similar (in Semantle score) to a provided word (a guess), and
4. An evolving strength of belief that each word in the vocabulary is the target.

In each step towards guessing the target, the solver does the following:

1. Choose a word for the guess. The current choice is the word with the strongest likelihood of being the target, but it could equally be any other word from the solver’s vocabulary (which might help triangulate better), or it could be provided by a human player with their own suspicions.
2. Score the guess. The Semantle simulator scores the guess.
3. Generate a cohort. The guess and the score are used to generate a new cohort of words that would share the same score with the guess.
4. Merge the cohort into the agent’s belief model. The score is added to the current belief strength for each word in the cohort, providing a proxy for likelihood for each word. The guess is also masked from further consideration.

## Show of strength

The chart below shows how the belief strength (estimated likelihood) of the target word gradually approaches the maximum belief strength of any word, as the target (which remains unknown until the end) appears in more and more cohorts.

We can also visualise the belief strength across the whole vocabulary at each guess, and the path the target word takes in relation to these distributions, in terms of its absolute score and its rank relative to other words.

## Superior solution?

The cohort solver can be (de)tuned to almost any level of performance by adjusting the parameters precision and recall, which determine the tightness of the similarity band and completeness of results from the generated cohorts. The gradient descent solver has potential for tuning parameters, but I didn’t explore this much. To compare the two, we’d therefore need to consider configurations of each solver. For now, I’m pleased that the two distinct sketches solve to my satisfaction!

I joined Scott Hirleman for an episode (#95) of the Data Mesh Radio podcast. Scott does great work connecting and educating the data mesh community, and we had fun talking about:

• Fitness functions to define “what good looks like” for data mesh and guide the evolution of analytic data architecture and operating model
• Team topologies as a system for organisational design that is sympathetic to data mesh
• Driving a delivery program through use cases
• Thin slicing and evolution of products

## Creative AI

I recently talked with Leon Gettler on an episode of the Talking Business podcast about Creative AI – paring people with AI to augment product and strategy development.

This connects with some themes I’ve blogged about here before, such as No Smooth Path to Good Design and Leave Product Development to the Dummies. Also, Sketching Semantle Solvers explores how machines might generate and test new ideas in a game scenario in a way that’s sympathetic to human players.

## Sketching Semantle Solvers

Semantle is an online puzzle game in which you make a series of guesses to discover a secret word. Each guess is scored by how “near” it is to the secret target, providing guidance for subsequent guesses, but that’s all the help you get. Fewer guesses is a better result, but hard to achieve, as the majority of words are not “near” and there are many different ways to get nearer to the target.

You could spend many enjoyable hours trying to solve a puzzle like this, or you could devote that time to puzzling over how a machine might solve it for you…

## Scoring system

Awareness of how the nearness score is calculated can inspire potential solutions. The score is based on a machine learning model of language; how frequently words appear in similar contexts. These models convert each word into a unique point in space (also known as an embedding) in such a way that similar words are literally near to one another in this space, and therefore the similarity score is higher for points that are nearer one another.

We can reproduce this similarity score ourselves with a list of English words and a trained machine learning model, even though these models use 100s of dimensions rather than two, as above. Semantle uses the word2vec model but there are also alternatives like USE. Comparing these results to the scores from a Semantle session could guide a machine’s guesses. We might consider this roughly equivalent to our own mental model of the nearness of any pair of words, which we could estimate if we were asked.

## Sibling strategies

Two general solution strategies occurred to me to find the target word guided by similarity scores: intersecting cohorts and gradient descent.

Intersecting cohorts: the score for each guess defines a group of candidate words that could be the target (because they have the same similarity with the guessed word as the score calculated from the target). By making different guesses, we get different target cohorts with some common words. These cohort intersections allow us to narrow in on the words most like to be the target, and eventually guess it correctly.

Gradient descent: each guess gives a score, and we look at the difference between scores and where the guesses are located relative to each other to try to identify the “semantic direction” in which the score is improving most quickly. We make our next guess in that direction. This doesn’t always get us closer but eventually leads us to the target.

I think human players tend more towards gradient descent, especially when close to the target, but also use some form of intersecting cohorts to hypothesise potential directions when uncertain. For a machine, gradient descent requires locations in embedding space to be known, while intersecting cohorts only cares about similarity of pairs.

## Sympathetic sequences

Semantle is open source and one could create a superhuman solver that takes unfair advantage of knowledge about the scoring system. For instance, 4 significant figures of similarity (as per semantle scores) allows for pretty tight cohorts. Additionally, perfectly recalling large cohorts of 10k similar words at each guess seems unrealistic for people.

I was aiming for something that produced results in roughly the same range as a human and that could also play alongside a human should they want a helpful suggestion. Based on limited experience, the human range seems to be – from exceptional to exasperated – about 20 to 200+ guesses.

• that the solving agent capabilities were clearly separated from the Semantle scoring system (I would like to use a different semantic model for the agent in future)
• that proposing the next guess and incorporating the results from a guess would be decoupled to allow the agent to play with others
• that the agent capabilities could be [de]tuned if required to adjust performance relative to humans or make its behaviour more interpretable

## Solution source

This post shares the source for the gradient descent solver and a simple simulator for semantle scores. Note that the word2vec model data for the simulator (and agent) is available at this word2vec download location.

I have also made a few iterations on the intersecting cohorts approach, which works but isn’t ready for publication.

## Seeking the secret summit

The gradient descent (or ascent to a summit) approach works pretty well by just going to the most similar word and moving a random distance in the direction of the steepest known gradient. The nearest not previously guessed word to the resultant point is proposed as the next guess. You can see a gradual but irregular improvement in similarity as it searches.

In the embedding space, I overlaid a network (or graph) of “nodes” representing words and their similarity to the target, and “spokes” representing the direction between nodes and the gradient of similarity in that direction. This network is initialised with a handful of random guesses before the gradient descent begins in earnest. Below I’ve visualised the search in this space with respect to the basis – the top node and spoke with best gradient – of each guess.

The best results are about 40 guesses and typically under 200, though may blow out on occasion. I haven’t really tried to optimise the search; as above, the first simple idea worked pretty well. To [de]tune or test the robustness of this solution, I’ve considered adding more noise to the search for the nearest word from the extrapolated point, or compromising the recall of nearby words, or substituting a different semantic model. These things might come in future. At this stage I just wanted to share a sketch of the solver rather than a settled solution.

Postscript: after publishing, I played with the search visualisation in an attempt to tell a more intuitive story (from literally to nobody).

## Stop the sibilants, s’il vous plaît

C’est suffit! I’m semantically sated. After that sublime string of subheadings, the seed of a supplementary Wordle spin-off sprouts: Alliteratle anyone?

## Data mesh: a lean perspective

Data mesh can be understood as a response to lean wastes identified in data organisations. I paired with Ned Letcher to present this perspective at the LAST Conference 2021, which was much delayed due to COVID restrictions.

Lean wastes including overproduction, inventory, etc, may be concealed and made more difficult to address by centralised data systems and team architectures. Conversely, data mesh may make these wastes visible where they exist and provides mechanisms for reducing these wastes.

The presentation is written up in this article, and the slides are also available.

## Bridging the linguistic inclusion gap with AI

It was great to be able to reflect with colleagues on common themes running through Thoughtworks’ work in languages and technology. In various scenarios, with different technology approaches, we worked to improve the inclusiveness of solutions, pointing to a more linguistically inclusive future.

https://www.thoughtworks.com/insights/blog/machine-learning-and-ai/how-ai-could-bridge-the-linguistic-inclusion-gap

## Slackometer Hello World

Project Slackpose gives me one more excuse for hyperlocal exercise and number crunching in lockdown. Last time, I briefly touched on balance analysis. This time, I look at tracking slackline distance walked with my newly minted slackometer.

## Inferring 3D Position

I’m working only with 2D pose data (a set of pixel locations for body joints) from my single camera angle, but I can infer something about my 3D location – distance from the camera (d) – using the pinhole camera model and further data such as:

1. Camera lens & sensor geometry, plus known real world distance between pose keypoints (geometric approach)
2. Consistent pose data (eg same person, similar pose) acquired at known distances from the camera (regression approach)

In this first iteration shown in the notebook, I boil all the pose data down into a single feature: the vertical pixel distance (v) between the highest tracked point (eg, an eye) and the lowest tracked point (eg, a toe). I base the calculation of distance d on this measure. This measure may be shorter than my full height by crown-to-eye height and a typically lower pose when balancing on a slackline, or an elevated arm joint might make it taller.

### Geometric Approach

The geometric approach uses a similar triangles formula where one triangle has sides lens-sensor distance (known) and object pixel height (v) scaled to sensor height (known), the other has lens-object distance (d) and object height (roughly known from my height, as above). The equation has the form d = F / v, where F is determined by those known or estimated factors. These geometry factors will be specific to each physical camera device and slackline walker combination.

### Regression Approach

The regression approach uses pose data collected at known distances from the slackline anchor – 2m, 4m, 6m, and 8m – as shown in the image below. I marked the calibration distances and walked to each point, balancing there for a few seconds, then turned around and walked back through each point to collect data from both back and front views. This approach is consistent across cameras and walkers, and works with varied sets of calibration distances.

Plotting the value of v against the reciprocal of distance (1 / d), we see a fairly linear relationship. Fitting a line (with 0 intercept) to the data gives a value for 1 / F, which produces a value for F that is very close to that from the geometric approach – a neat result, or convenient choice of parameters!

So we have two methods for calculating distance from pose data, which produce very similar results.

## Does It Work?

The figures you see in the video above match up pretty well with the reality. The video shows approximately 10m accumulated distance, while in reality I started about 2m from the camera, walked to just under 8m from the camera, then returned almost to my starting point (say 11.5m max). The discrepancy is most likely explained by an under-estimate of peak distance (at ~7m) due to decrease in precision of pixel measures for distant objects, and noise/pose-dependence in the vertical extent measure.

So this first iteration of the slackometer would be useful for approximating distance walked, and could possibly be improved with higher resolution video and by tracking more body segments, which may also reduce the dependence on smoothing. It would also be useful for comparing distances or speeds. The millimetre precision, however, is misleading, I just chose the unit so it looked better on the odometer display! (I spent some time tuning this for those cascading transitions…)

There are certainly are other ways you could track distance, starting from pacing out the line and keeping count of laps, to using image segmentation rather than pose estimation to calculate v, to alternative sensor setups and models, but it was fun to do it this way and helped pass a few more days, and nights, in lockdown.

## Project Slackpose

Another lockdown, another project for body and mind. Slackpose allows me to track my slackline walking and review my technique. Spending 5 minutes on the slackline between meetings is a great way to get away from my desk!

I had considered pose estimation for wheelies last year, but decided slackline walking was an easier start, and something the whole family could enjoy.

## Setup

I mount my phone on a tripod at one end of the slackline and start recording a video. This gives a good view of side-to-side balance, and is most pixel-efficient in vertical orientation.

Alternatively, duct tape the phone to something handy, or even hand-hold or try other angles. The 2D pose estimation (location of body joints in the video image) will be somewhat invariant to the shooting location, but looking down the slackline (or shooting from another known point) may help reconstruct more 3D pose data using only a single camera view. Luckily, it’s also invariant to dogs wandering into frame!

I use OpenPose from the CMU Perceptual Computing Lab to capture pose data from the videos. See below for details of the keypoint pose data returned and some notes on setting up and running OpenPose.

I then analyse the keypoint pose data in a Jupyter notebook that you can run on Colab. This allows varied post-processing analyses such as the balance analysis below.

## Keypoint Pose Data

OpenPose processes the video to return data on 25 body keypoints for each frame, representing the position of head, shoulders, knees, and toes, plus eyes, ears, and nose and other major joints (but mouth only if you explicitly request facial features).

These body keypoints are defined as (x, y) pixel locations in the video image, for one frame of video. We can trace the keypoints over multiple frames to understand the motion of parts of the body.

Keypoints also include a confidence measure [0, 1], which is pretty good for the majority of keypoints from the video above.

## Balance Analysis

I wanted to look at balance first, using an estimate of my body’s centre of mass. I calculated this from the proportional mass of body segments (sourced here) with estimates of the location centre of mass for each segment relative to the pose keypoints (I just made these up, and you can see what I made up in the notebook).

This looks pretty good from a quick eyeball, although it’s apparent that it is sensitive to the quality of pose estimation of relatively massive body segments, such as my noggin, estimated at 8.26% of my body mass. When walking away, OpenPose returns very low confidence and a default position of (0, 0) for my nose for many frames, so I exclude it from the centre of mass calculation in those instances. You can see the effect of excluding my head in the video below.

Not much more to report at this point; I’ll have a better look at this soon, now that I’m up and walking with analysis.

## OpenPose Notes

### Setup

I ran OpenPose on my Mac, following the setup guide at https://github.com/CMU-Perceptual-Computing-Lab/openpose, and also referred to two tutorials. These instructions are collectively pretty good, but note:

• `3rdparty/osx/install_deps.sh` doesn’t exist, you will instead find it at `scripts/osx/install_deps.sh`
• I had to manually pip install numpy and opencv for OpenPose with `pip install 'numpy<1.17'` and `pip install 'opencv-python<4.3`‘, but this is probably due to my neglected Python2 setup.
• Homebrew cask installation syntax has changed from the docs; now invoked as `brew install --cask cmake`

### Running

Shooting iPhone video, I need to convert format for input to OpenPose. Use ffmeg as below (replace `slackline.mov/mp4` with the name of your video).

`ffmpeg -i slackline.mov -vcodec h264 -acodec mp2 slackline.mp4`

I then typically invoke OpenPose to process video and output marked up frames and JSON files with the supplied example executable, as below (again replace input video and output directory with your own):

`./build/examples/openpose/openpose.bin --video slack_video/slackline.mp4 --write_images slack_video/output --write_json slack_video/output`