If there’s one place on Reddit that this kind of self-promotion shouldn’t be considered inappropriate, it’d be in the subreddit that carries my name. And now that we’re
this high up (and you’re seeing my face everywhere already), I’ve got no problem with any of you having a link by which you can work out exactly who I am in real
life.
So here’s the latest post from my ‘blog. It’s pretty dull unless you use the Squiz CMS at your workplace or write PHP code in your day job, though. I promise I’m more interesting,
sometimes.
Anybody who has, like me, come into contact with the Squiz Matrix CMS for any length of time will
have come across the reasonably easy-to-read but remarkably long CAPTCHA that it
shows. These are especially-noticeable in its administrative interface, where it uses them as an exaggerated and somewhat painful “are you sure?” – restarting the CMS’s internal
crontab manager, for example, requires that the administrator types a massive 25-letter CAPTCHA.
But there’s another interesting phenomenon that one begins to notice after seeing enough of the back-end CAPTCHA that appear. Strange patterns of letters that appear in sequence
more-often than would be expected by chance. If you’re a fan of wordsearches, take a look at the composite screenshot above: can you find a person’s name in each of the four lines?
There are four names – Greg, Dom, Blair and Marc – which routinely appear in these CAPTCHA.
Blair, being the longest name, was the first that I noticed, and at first I thought that it might represent a fault in the pseudorandom number generation being used that was resulting
in a higher-than-normal frequency of this combination of letters. Another idea I toyed with was that the CAPTCHA text might be being entirely generated from a set of pronounceable
syllables (which is a reasonable way to generate one-time passwords that resist entry errors resulting from reading difficulties: in fact, we do this at Three Rings), in which these four names also appear, but by now I’d have
thought that I’d have noticed this in other patterns, and I hadn’t.
Instead, then, I had to conclude that these names were some variety of Easter Egg.
I was curious about where they were coming from, so I searched the source code, but while I found plenty of references to Greg Sherwood, Marc McIntyre, and Blair Robertson. I
couldn’t find Dom, but I’ve since come to discover that he must be Dominic Wong – these four were, according to Greg’s blog – developers with Squiz in the early 2000s, and seemingly saw themselves as a dynamic
foursome responsible for the majority of the CMS’s code (which, if the comment headers are to be believed, remains true).
That still didn’t answer for me why searching for their names in the source didn’t find the responsible code. I started digging through the CMS’s source code, where I eventually
found fudge/general/general.inc (a lot of Squiz CMS code is buried in a folder called “fudge”, and web addresses used internally sometimes contain this word, too: I’d like to
believe that it’s being used as a noun and that the developers were just fans of the buttery sweet, but I have a horrible feeling that it was used in its popular verb form). In that file, I found
this function definition:
/**
* Generates a string to be used for a security key
*
* @param int $key_len the length of the random string to display in the image
* @param boolean $include_uppercase include uppercase characters in the generated password
* @param boolean $include_numbers include numbers in the generated password
*
* @return string
* @access public
*/
function generate_security_key($key_len, $include_uppercase = FALSE, $include_numbers = FALSE) {
$k = random_password($key_len, $include_uppercase, $include_numbers);
if ($key_len > 10) {
$gl = Array('YmxhaXI=', 'Z3JlZw==', 'bWFyYw==', 'ZG9t');
$g = base64_decode($gl[rand(0, (count($gl) - 1)) ]);
$pos = rand(1, ($key_len - strlen($g)));
$k = substr($k, 0, $pos) . $g . substr($k, ($pos + strlen($g)));
}
return $k;
} //end generate_security_key()
For the benefit of those of you who don’t speak PHP, especially PHP that’s been made deliberately hard to decipher, here’s what’s happening when “generate_security_key” is being called:
A random password is being generated.
If that password is longer than 10 characters, a random part of it is being replaced with either “blair”, “greg”, “marc”, or “dom”. The reason that you can’t see these words in the
code is that they’re trivially-encoded using a scheme called Base64 – YmxhaXI=, Z3JlZw==, bWFyYw==, and ZG9t are Base64 representations of the four
names.
This seems like a strange choice of Easter Egg: immortalising the names of your developers in CAPTCHA. It seems like a strange choice especially because this somewhat weakens the
(already-weak) CAPTCHA, because an attacking robot can quickly be configured to know that a 11+-letter codeword will always consist of letters and exactly one instance of one of these
four names: in fact, knowing that a CAPTCHA will always contain one of these four and that I can refresh until I get one that I like, I can quickly turn an
11-letter CAPTCHA into a 6-letter one by simply refreshing until I get one with the longest name – Blair – in it!
A lot has been written about how Easter Eggs undermine software security (in exchange for a
small boost to developer morale) – that’s a major part of why Microsoft has banned them from its operating systems (and, for the most part, Apple has too). Given that these
particular CAPTCHA in Squiz CMS are often nothing more than awkward-looking “are you sure?” dialogs, I’m not concerned about the direct security implications, but it does make me worry
a little about the developer culture that produced them.
I know that this Easter Egg might be harmless, but there’s no way for me to know (short of auditing the entire system) what other Easter Eggs might be hiding under the
surface and what they do, especially if the developers have, as in this case, worked to cover their tracks! It’s certainly the kind of thing I’d worry about if I were, I don’t
know, a major government who use Squiz software, especially their cloud-hosted variants which are harder to
effectively audit. Just a thought.