A little smarter, anyway. I didn’t expect to pick this up again, but when I occasionally run the first generation solvers online, I’m often equal parts amused and frustrated by rare words thrown up that delay the solution – from amethystine to zigging.
I measure solver performance by running multiple trials of a solver configuration against the simulator for a variety of target words. This gives a picture of how often the solver typically succeeds within a certain number of guesses.
It turns out that the vocabulary to date based on
english_words_set is a poor match for the most frequently used English words, according to unigram frequency data.
So we might expect that simply replacing the solver vocabulary would improve performance, and we also get word ranking from
We’ll continue with Universal Sentence Encoder (USE) to ensure search strategies are robust to different semantic models.
To improve the gradient solver I tried making another random guess every so often to avoid long stretches exploring local minima. But it didn’t make things better, and probably made them worse!
In response, I made each guess the most common local word to the extrapolated semantic location, rather than just the nearest word. Still no better, and trying both “improvements” together was significantly worse!
Ah well, experiments only fail if we fail to learn from them!
I think the noise inherent in a different semantic model, plus the existing random extrapolation distance, overwhelms the changes I tried. In better news, we see a major improvement from using
unigram freq vocabulary, reducing the mean from 280 (with many searches capped at 500) to 198, approximately a 30% improvement.
Here we see that the data-centric (vocabulary) improvement had a far bigger impact than any model-centric (search algorithm) improvement that I had the patience to try (though I left a bunch of further todos). Maybe just guessing randomly from the top n words will be better again! ????
At least I’ve made a substantial dent in reducing those all-too-common guesses at rare words.