Smarter Semantle Solvers

A little smarter, anyway. I didn’t expect to pick this up again, but when I occasionally run the first generation solvers online, I’m often equal parts amused and frustrated by rare words thrown up that delay the solution – from amethystine to zigging.

Animation of online semantle solution — An example solution with fewer than typical rare words guessed

The solvers used the first idea that worked; can we make some tweaks to make them smarter? The code is now migrated to its own new repo after outgrowing its old home.

Measuring smarts

I measure solver performance by running multiple trials of a solver configuration against the simulator for a variety of target words. This gives a picture of how often the solver typically succeeds within a certain number of guesses.

Chart showing cumulative distribution function curves for two solver configurations

Vocabulary

It turns out that the vocabulary to date based on english_words_set is a poor match for the most frequently used English words, according to unigram frequency data.

So we might expect that simply replacing the solver vocabulary would improve performance, and we also get word ranking from unigram_freq.

Semantic models

We’ll continue with Universal Sentence Encoder (USE) to ensure search strategies are robust to different semantic models.

Search

To improve the gradient solver I tried making another random guess every so often to avoid long stretches exploring local minima. But it didn’t make things better, and probably made them worse!

In response, I made each guess the most common local word to the extrapolated semantic location, rather than just the nearest word. Still no better, and trying both “improvements” together was significantly worse!

Ah well, experiments only fail if we fail to learn from them!

Vocabulary again

I think the noise inherent in a different semantic model, plus the existing random extrapolation distance, overwhelms the changes I tried. In better news, we see a major improvement from using unigram freq vocabulary, reducing the mean from 280 (with many searches capped at 500) to 198, approximately a 30% improvement.

Smarter still?

Here we see that the data-centric (vocabulary) improvement had a far bigger impact than any model-centric (search algorithm) improvement that I had the patience to try (though I left a bunch of further todos). Maybe just guessing randomly from the top n words will be better again! ????

At least I’ve made a substantial dent in reducing those all-too-common guesses at rare words.