Picking up threads from previous posts on solving Semantle word puzzles with machine learning, it’s time to compare how different versions of solvers might play along with people, and play against the game online.
The solvers are designed around an encapsulated similarity model that we can easily change to give them a different view of words and how they relate. To date, we’ve used the same model as live Semantle, which is word2vec. But as this might be considered cheating, we can now also use a model based on the Universal Sentence Encoder (USE), to explore how the solvers perform with separated semantics.
The key elements of the solver ecosystem are now:
- SimilarityModel – choice of word2vec or USE as above,
- Solver methods (common to both gradient and cohort variants):
- make_guess() – return a guess that is based on the solver’s current state, but don’t change the solver’s state,
- merge_guess(guess, score) – update the solver’s state with information about a guess and a score,
- Scoring of guesses by either the simulator or a Semantle game, where a game could also include guesses from other players.
It’s a simplified reinforcement learning setup. Different combination these elements allow us to explore different scenarios.
I’m not advocating for using these in a game of course, but it’s interesting to consider how the solvers would play along with people in hypothetical scenarios :). The base scenario friends is the actual history of a game played with people, completed in 109 guesses.
The first scenario is showing how to complete a puzzle from an initial sequence of guesses from friends. Both solvers in this configuration generally easily better the friends result when primed with the first 10 friend guesses.
The second scenario is making a suggestion for the next guess only, but based on the game history up to that point. Both solvers may permit a finish in slightly fewer guesses. The conclusion is that these solvers are good for hints, especially if they are followed!
Maybe these solvers using word2vec similarity do have an unfair advantage though – how do they perform with a different similarity model? Using USE instead, I expected the cohort solver to be more robust than the gradient solver…
… but it seems that the gradient descent solver is more robust to a disparate similarity model, as one example of the completion scenario shows.
The gradient solver also generally offers some benefit making a suggestion for just the next guess, but the cohort solver’s contribution is marginal at best.
These are of course only single instances of each scenario, and there is significant variation between runs. It’s been interesting to see this play out interactively, but a more comprehensive performance characterisation – with plenty of scope for understanding the influence of hyperparameters – may be in order.
The solvers can also play part or whole games solo (or with other players) in a live environment, using Selenium WebDriver to submit guesses and collect scores. The leading animation above is gradient-USE and a below is a faster game using cohort-word2vec.
Seeing in space
Through all this I’ve considered various 3D visualisations of search through a semantic space with hundreds of dimensions. I’ve settled on the version below, illustrating a search for target “habitat” from first guess “megawatt”.
This visualisation format uses cylindrical coordinates, broken out in the figure below. The cylinder (x) axis is the projection of each guess to the line that connects the first guess to the target word. The cylindrical radius is the distance of each guess in embedding space from its projection on this line (cosine similarity seemed smoother than Euclidian distance here). The angle of rotation in cylindrical coordinates (theta) is the cumulative angle between the directions connecting guess
n+1. The result is an irregular helix expanding then contracting, all while twisting around the axis from first to lass guess.