What’s the hardest word to guess, when playing hangman? I’ll come back to that.
Last year, Nick Berry wrote a fantastic blog post about the optimal strategy for Hangman. He showed that the best guesses to make to get your first “hit” in a game of hangman are not the most-commonly occurring letters in written English, because these aren’t the most commonly-occurring letters in individual words. He also showed that the first guesses should be adjusted based on the length of the word (the most common letter in 5-letter words is ‘S’, but the most common letter in 6-letter words is ‘E’). In short: hangman’s a more-complex game than you probably thought it was! I’d like to take his work a step further, and work out which word is the hardest word: that is – assuming you’re playing an optimal strategy, what word takes the most-guesses?
First, though, we need to understand how hangman is perfectly played. Based on the assumption that the “executioner” player is choosing words randomly, and that no clue is given as to the nature of the word, we can determine the best possible move for all possible states of the game by using a data structure known as a tree. Suppose our opponent has chosen a three-letter word, and has drawn three dashes to indicate this. We know from Nick’s article that the best letter to guess is A. And then, if our guess is wrong, the next best letter to guess is E. But what if our first guess is right? Well, then we’ve got an “A” in one or more positions on the board, and we need to work out the next best move: it’s unlikely to be “E” – very few three-letter words have both an “A” and an “E” – and of course what letter we should guess next depends entirely on what positions the letters are in.
What we’re actually doing here is a filtering exercise: of all of the possible letters we could choose, we’re considering what possible results that could have. Then for each of those results, we’re considering what guesses we could make next, and so on. At each stage, we compare all of the possible moves to a dictionary of all possible words, and filter out all of the words it can’t be: after our first guess in the diagram above, if we guess “A” and the board now shows “_ A _”, then we know that of the 600+ three-letter words in the English language, we’re dealing with one of only about 134. We further refine our guess by playing the odds: of those words, more of them have a “C” in than any other letter, so that’s our second guess. If it has a C in, that limits the options further, and we can plan the next guess accordingly. If it doesn’t have a C in, that still provides us with valuable information: we’re now looking for a three-letter word with an A in the second position and no letter C: that cuts it down to 124 words (and our next guess should be ‘T’). This tree-based mechanism for working out the best moves is comparable to that used by other game-playing computers. Hangman is simple enough that it can be “solved” by contemporary computers (like draughts – solved in 2007 – but unlike chess: while modern chess-playing computers can beat humans, it’s still theoretically possible to build future computers that will beat today’s computers).
Now that we can simulate the way that a perfect player would play against a truly-random executioner, we can use this to simulate games of hangman for every possible word (I’m using version 0.7 of this British-English dictionary). In other words, we set up two computer players: the first chooses a word from the dictionary, the second plays “perfectly” to try to guess the word, and we record how many guesses it took. So that’s what I did. Here’s the Ruby code I used. It’s heavily-commented and probably pretty understandable/good learning material, if you’re into that kind of thing. Or if you fancy optimising it, there’s plenty of scope for that too (I knocked it out on a lunch break; don’t expect too much!). Or you could use it as the basis to make a playable hangman game. Go wild.
Running the program, we can see that the hardest three-letter word is “xxv”, which would take 22 guesses (20 of them wrong!) to get. But aside from the roman numeral for 25, I don’t think that “xxv” is actually a word. Perhaps my dictionary’s not very good. “Oak”, though, is definitely a word, and at 20 guesses (17 wrong), it’s easily enough to hang your opponent no matter how many strokes it takes to complete the gallows.
There are more tougher words in the four-letter set, like the devious “quiz”, “jazz”, “zinc”, and “faux”. Pick one of those and your opponent – unless they’ve seen this blog post! – is incredibly unlikely to guess it before they’re swinging from a rope.
As we get into the 5, 6, and 7-letter words you’ll begin to notice a pattern: that the hardest words with any given number of letters get easier the longer they are. That’s kind of what you’d expect, I suppose: if there were a hypothetical word that contained every letter in the alphabet, then nobody would ever fail to (eventually) get it.
When we make a graph of each word length, showing which proportion of the words require a given number of “wrong” guesses (by an optimised player), we discover a “sweet spot” window in which we’ll find all of the words that an optimised player will always fail to guess (assuming that we permit up to 10 incorrect guesses before they’re disqualified). The window seems small for the number of times I remember seeing people actually lose at hangman, which implies to me that human players consistently play sub-optimally, and do not adequately counteract that failing by applying an equal level of “smart”, intuitive play (knowing one’s opponent and their vocabulary, looking for hints in the way the game is presented, etc.).
In case you’re interested, then, here are the theoretically-hardest words to throw at your hangman opponent. While many of the words there feel like they would quite-rightly be difficult, others feel like they’d be easier than their ranking would imply: this is probably because they contain unusual numbers of vowels or vowels in unusual-but-telling positions, which humans (with their habit, inefficient under normal circumstances, of guessing an extended series of vowels to begin with) might be faster to guess than a computer.
|Word||Guesses taken||“Wrong” guesses needed|
|Download a longer list
(there’s plenty more which you’d expect to “win” with)
If you use this to give you an edge in your next game, let me know how it works out for you!