Photo of external stairwell

Synthesising Semantle Solvers

Picking up threads from previous posts on solving Semantle word puzzles with machine learning, it’s time to compare how different versions of solvers might play along with people, and play against the game online.

Animation of a Semantle game from initial guess to completion

Substitute semantics

The solvers are designed around an encapsulated similarity model that we can easily change to give them a different view of words and how they relate. To date, we’ve used the same model as live Semantle, which is word2vec. But as this might be considered cheating, we can now also use a model based on the Universal Sentence Encoder (USE), to explore how the solvers perform with separated semantics.

Solver spec

The key elements of the solver ecosystem are now:

  • SimilarityModel – choice of word2vec or USE as above,
  • Solver methods (common to both gradient and cohort variants):
    • make_guess() – return a guess that is based on the solver’s current state, but don’t change the solver’s state,
    • merge_guess(guess, score) – update the solver’s state with information about a guess and a score,
  • Scoring of guesses by either the simulator or a Semantle game, where a game could also include guesses from other players.
Diagram illustrating elements of the solver ecosystem. Similarity model initialises solver state used to make guesses, which are scored by game and update solver state with scores. Other players can make guesses which also get scored

It’s a simplified reinforcement learning setup. Different combination these elements allow us to explore different scenarios.

Solver suggestions

I’m not advocating for using these in a game of course, but it’s interesting to consider how the solvers would play along with people in hypothetical scenarios :). The base scenario friends is the actual history of a game played with people, completed in 109 guesses.

Word2Vec similarity

The first scenario is showing how to complete a puzzle from an initial sequence of guesses from friends. Both solvers in this configuration generally easily better the friends result when primed with the first 10 friend guesses.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Cohort finishes with fewest guesses, then gradient, then friends, with clear separation.

The second scenario is making a suggestion for the next guess only, but based on the game history up to that point. Both solvers may permit a finish in slightly fewer guesses. The conclusion is that these solvers are good for hints, especially if they are followed!

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Cohort finishes with fewest guesses, then gradient, then friends, with marginal differences.

Maybe these solvers using word2vec similarity do have an unfair advantage though – how do they perform with a different similarity model? Using USE instead, I expected the cohort solver to be more robust than the gradient solver…

USE similarity

… but it seems that the gradient descent solver is more robust to a disparate similarity model, as one example of the completion scenario shows.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Gradient finishes with fewest guesses, then friends, then cohort, and the separation is clear.

The gradient solver also generally offers some benefit making a suggestion for just the next guess, but the cohort solver’s contribution is marginal at best.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Gradient finishes with fewest guesses, then friends, and cohort doesn't finish, but the differences are very minor.

These are of course only single instances of each scenario, and there is significant variation between runs. It’s been interesting to see this play out interactively, but a more comprehensive performance characterisation – with plenty of scope for understanding the influence of hyperparameters – may be in order.

Solver solo

The solvers can also play part or whole games solo (or with other players) in a live environment, using Selenium WebDriver to submit guesses and collect scores. The leading animation above is gradient-USE and a below is a faster game using cohort-word2vec.

Animation of a Semantle game from initial guess to completion

Seeing in space

Through all this I’ve considered various 3D visualisations of search through a semantic space with hundreds of dimensions. I’ve settled on the version below, illustrating a search for target “habitat” from first guess “megawatt”.

An animated rotating 3D view of an semi-regular collection of points joined by lines into a sequence. Some points are labelled with words. Represents high-dimensional semantic search in 3D.

This visualisation format uses cylindrical coordinates, broken out in the figure below. The cylinder (x) axis is the projection of each guess to the line that connects the first guess to the target word. The cylindrical radius is the distance of each guess in embedding space from its projection on this line (cosine similarity seemed smoother than Euclidian distance here). The angle of rotation in cylindrical coordinates (theta) is the cumulative angle between the directions connecting guess n-1 to n and n to n+1. The result is an irregular helix expanding then contracting, all while twisting around the axis from first to lass guess.

Three line charts on a row, with common x-axis of guess number, showing semi-regular lines, representing the cylindrical coordinates of the 3D visualisation. The left chart is x-axis, increasing from 0 to 1, middle is radius, from 0 to ~1 and back to 0, and right is angle theta, increasing from 0 to ~11 radians.