What if Emails were Multilingual?

Multilingual emails

Back when I was a student in Aberystwyth, I used to receive a lot of bilingual emails from the University and its departments1. I was reminded of this when I received an email this week from CACert, delivered in both English and German.

Top part of an email from CACert, which begins with the text "German translation below / Deutsche Uebersetzung weiter unten".
Simply putting one language after the other isn’t terribly exciting. Although to be fair, the content of this email wasn’t terribly exciting either.

Wouldn’t it be great if there were some kind of standard for multilingual emails? Your email client or device would maintain an “order of preference” of the languages that you speak, and you’d automatically be shown the content in those languages, starting with the one you’re most-fluent in and working down.

The Web’s already got this functionality2, and people have been sending multilingual emails for much longer than they’ve been developing multilingual websites3!

Enter RFC8255!

It turns out that this is a (theoretically) solved problem. RFC8255 defines a mechanism for breaking an email into multiple different languages in a way that a machine can understand and that ought to be backwards-compatible (so people whose email software doesn’t support it yet can still “get by”). Here’s how it works:

  1. You add a Content-Type: multipart/multilingual header with a defined boundary marker, just like you would for any other email with multiple “parts” (e.g. with a HTML and a plain text version, or with text content and an attachment).
  2. The first section is just a text/plain (or similar) part, containing e.g. some text to explain that this is a multilingual email, and if you’re seeing this then your email client probably doesn’t support them, but you should just be able to scroll down (or else look at the attachments) to find content in the language you read.
  3. Subsequent sections have:
    • Content-Disposition: inline, so that for most people using non-compliant email software they can just scroll down until they find a language they can read,
    • Content-Type: message/rfc822, so that an entire message can be embedded (which allows other headers, like the Subject:, to be translated too),
    • a Content-Language: header, specifying the ISO code of the language represented in that section, and
    • optionally, a Content-Translation-Type: header, specifying either original (this is the original text), human (this was translated by a human), or automated (this was the result of machine translation) – this could be used to let a user say e.g. that they’d prefer a human translation to an automated one, given the choice between two second languages.

Let’s see a sample email:

Content-Type: multipart/multilingual;
 boundary=10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
To: <b24571@danq.me>
From: <rfc8255test-noreply@danq.link>
Subject: Does your email client support RFC8255?
Mime-Version: 1.0
Date: Fri, 27 Sep 2024 10:06:56 +0000

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=utf-8

This is a multipart message in multiple languages. Each part says the
same thing but in a different language. If your email client supports
RFC8255, you will see this message in your preferred language out of
those available. Otherwise, you will probably see each language after
one another or else each language in a separate attachment.

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
Content-Disposition: inline
Content-Type: message/rfc822
Content-Language: en
Content-Translation-Type: original

Subject: Does your email client support RFC8255?
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0

RFC8255 is a standard for sending email in multiple languages. This
is the original email in English. It is embedded alongside the same
content in a number of other languages.

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
Content-Disposition: inline
Content-Type: message/rfc822
Content-Language: fr
Content-Translation-Type: automated

Subject: Votre client de messagerie prend-il en charge la norme RFC8255?
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0

RFC8255 est une norme permettant d'envoyer des courriers
électroniques dans plusieurs langues. Le présent est le courriel
traduit en français. Il est intégré à côté du même contenu contenu
dans un certain nombre d'autres langues.

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664--
Why not copy-paste this into a raw email and see how your favourite email client handles it! That’ll be fun, right?

Can I use it?

That proposed standard turns seven years old next month. Sooo… can we start using it?4

Turns out… not so much. I discovered that NeoMutt supports it:

NeoMutt’s implementation is basic, but it works: you can specify a preference order for languages and it respects it, and if you don’t then it shows all of the languages as a series of attachments. It can apparently even be used to author compliant multilingual emails, although I didn’t get around to trying that.

Support in other clients is… variable.

A reasonable number of them don’t understand the multilingual directives but still show the email in a way that doesn’t suck:

Screenshot from Thunderbird, showing each language one after the other.
Mozilla Thunderbird does a respectable job of showing each language’s subject and content, one after another.

Some shoot for the stars but blow up on the launch pad:

Screenshot from GMail, showing each language one after the other, but with a stack of extra headers and an offer to translate it to English for me (even though the English is already there).
GMail displays all the content, but it pretends that the alternate versions are forwarded messages and adds a stack of meaningless blank headers to each. And then offers to translate the result for you, even though the content is already right there in English.

Others still seem to be actively trying to make life harder for you:

ProtonMail’s Web interface shows only the fallback content, putting the remainder into .eml attachments… which is then won’t display, forcing you to download them and find some other email client to look at them in!5

And still others just shit the bed at the idea that you might read an email like this one:

Screenshot from Outlook 365, showing the message "This message might have been moved or deleted".
Outlook 365 does appallingly badly, showing the subject in the title bar, then the words “(No subject)”, then the message “This message might have been removed or deleted”. Just great.

That’s just the clients I’ve tested, but I can’t imagine that others are much different. If you give it a go yourself with something I’ve not tried, then let me know!

I guess this means that standardised multilingual emails might be forever resigned to the “nice to have but it never took off so we went in a different direction” corner of the Internet, along with the <keygen> HTML element and the concept of privacy.

Footnotes

1 I didn’t receive quite as much bilingual email as you might expect, given that the University committed to delivering most of its correspondence in both English and Welsh. But I received a lot more than I do nowadays, for example

2 Although you might not guess it, given how many websites completely ignore your Accept-Language header, even where it’s provided, and simply try to “guess” what language you want using IP geolocation or something, and then require that you find whatever shitty bit of UI they’ve hidden their language selector behind if you want to change it, storing the result in a cookie so it inevitably gets lost and has to be set again the next time you visit.

3 I suppose that if you were sending HTML emails then you might use the lang="..." attribute to mark up different parts of the message as being in different languages. But that doesn’t solve all of the problems, and introduces a couple of fresh ones.

4 If it were a cool new CSS feature, you can guarantee that it’d be supported by every major browser (except probably Safari) by now. But email doesn’t get so much love as the Web, sadly.

5 Worse yet, if you’re using ProtonMail with a third-party client, ProtonMail screws up RFC8255 emails so badly that they don’t even work properly in e.g. NeoMutt any more! ProtonMail swaps the multipart/multilingual content type for multipart/mixed and strips the Content-Language: headers, making the entire email objectively less-useful.

× × × ×

WP Engine’s Three Problems

Duration

Podcast Version

This post is also available as a podcast. Listen here, download for later, or subscribe wherever you consume podcasts.

If you’re active in the WordPress space you’re probably aware that there’s a lot of drama going on right now between (a) WordPress hosting company WP Engine, (b) WordPress hosting company (among quite a few other things) Automattic1, and (c) the WordPress Foundation.

If you’re not aware then, well: do a search across the tech news media to see the latest: any summary I could give you would be out-of-date by the time you read it anyway!

Illustration showing relationships between WordPress and Automattic (licensing trademarks and contributing effort to), between WordPress and WP Engine (the latter profits from the former), and between Automattic and WP Engine (throwing lawsuits at one another).
I tried to draw a better diagram with more of the relevant connections, but it quickly turned into spaghetti.

A declaration of war?

Like others, I’m not sure that the way Matt publicly called-out WP Engine at WCUS was the most-productive way to progress a discussion2.

In particular, I think a lot of the conversation that he kicked off conflates three different aspects of WP Engine’s misbehaviour. That muddies the waters when it comes to having a reasoned conversation about the issue3.

Matt Mullwenweg on stage at WordCamp US 2024, stating how he feels that WP Engine exploits WordPress (to great profit) without contributing back.
I’ve heard Matt speak a number of times, including in person… and I think he did a pretty bad job of expressing the problems with WP Engine during his Q&A at WCUS. In his defence, it sounds like he may have been still trying to negotiate a better way forward until the very second he walked on stage that day.

I don’t think WP Engine is a particularly good company, and I personally wouldn’t use them for WordPress hosting. That’s not a new opinion for me: I wouldn’t have used them last year or the year before, or the year before that either. And I broadly agree with what I think Matt tried to say, although not necessarily with the way he said it or the platform he chose to say it upon.

Misdeeds

As I see it, WP Engine’s potential misdeeds fall into three distinct categories: moral, ethical4, and legal.

Morally: don’t take without giving back

Matt observes that since WP Engine’s acquisition by huge tech-company-investor Silver Lake, WP Engine have made enormous profits from selling WordPress hosting as a service (and nothing else) while making minimal to no contributions back to the open source platform that they depend upon.

If true, and it appears to be, this would violate the principle of reciprocity. If you benefit from somebody else’s effort (and you’re able to) you’re morally-obliged to at least offer to give back in a manner commensurate to your relative level of resources.

Two children sit on a bed: one hands a toy dinosaur to the other.
The principle of reciprocity is a moral staple. This is evidenced by the fact that children (and some nonhuman animals) seem to be able to work it out for themselves from first principles using nothing more than empathy. Companies, however aren’t usually so-capable. Photo courtesy Cotton.

Abuse of this principle is… sadly not-uncommon in business. Or in tech. Or in the world in general. A lightweight example might be the many millions of profitable companies that host atop the Apache HTTP Server without donating a penny to the Apache Foundation. A heavier (and legally-backed) example might be Trump Social’s implementation being based on a modified version of Mastodon’s code: Mastodon’s license requires that their changes are shared publicly… but they don’t do until they’re sent threatening letters reminding them of their obligations.

I feel like it’s fair game to call out companies that act amorally, and encourage people to boycott them, so long as you do so without “punching down”.

Ethically: don’t exploit open source’s liberties as weaknesses

WP Engine also stand accused of altering the open source code that they host in ways that maximise their profit, to the detriment of both their customers and the original authors of that code5.

It’s well established, for example, that WP Engine disable the “revisions” feature of WordPress6. Personally, I don’t feel like this is as big a deal as Matt makes out that it is (I certainly wouldn’t go as far as saying “WP Engine is not WordPress”): it’s pretty commonplace for large hosting companies to tweak the open source software that they host to better fit their architecture and business model.

But I agree that it does make WordPress as-provided by WP Engine significantly less good than would be expected from virtually any other host (most of which, by the way, provide much better value-for-money at any price point).

Fake web screenshot showing turdpress.com, "WordPress... But Shit".
There’s nothing to stop me from registering TurdPress.com and providing a premium WordPress web hosting solution with all the best features disabled: I could even disable exports so that my customers wouldn’t even be able to easily leave my service for greener pastures! There’s nothing stop me… but that wouldn’t make it right7.
It also looks like WP Engine may have made more-nefarious changes, e.g. modifying the referral links in open source code (the thing that earns money for the original authors of that code) so that WP Engine can collect the revenue themselves when they deploy that code to their customers’ sites. That to me feels like it’s clearly into the zone ethical bad practice. Within the open source community, it’s not okay to take somebody’s code, which they were kind enough to release under a liberal license, strip out the bits that provide their income, and redistribute it, even just as a network service8.

Again, I think this is fair game to call out, even if it’s not something that anybody has a right to enforce legally. On which note…

Legally: trademarks have value, don’t steal them

Automattic Inc. has a recognised trademark on WooCommerce, and is the custodian of the WordPress Foundation’s trademark on WordPress. WP Engine are accused of unauthorised use of these trademarks.

Obviously, this is the part of the story you’re going to see the most news media about, because there’s reasonable odds it’ll end up in front of a judge at some point. There’s a good chance that such a case might revolve around WP Engine’s willingness (and encouragement?) to allow their business to be called “WordPress Engine” and to capitalise on any confusion that causes.

Screenshot from the WordPress Foundation's Trademark Policy page, with all but the first line highlighted of the paragraph that reads: The abbreviation “WP” is not covered by the WordPress trademarks, but please don’t use it in a way that confuses people. For example, many people think WP Engine is “WordPress Engine” and officially associated with WordPress, which it’s not. They have never once even donated to the WordPress Foundation, despite making billions of revenue on top of WordPress.
I don’t know how many people spotted this ninja-edit addition to the WordPress Foundation’s Trademark Policy page, but I did.

I’m not going to weigh in on the specifics of the legal case: I Am Not A Lawyer and all that. Naturally I agree with the underlying principle that one should not be allowed to profit off another’s intellectual property, but I’ll leave discussion on whether or not that’s what WP Engine are doing as a conversation for folks with more legal-smarts than I. I’ve certainly known people be confused by WP Engine’s name and branding, though, and think that they must be some kind of “officially-licensed” WordPress host: it happens.

If you’re following all of this drama as it unfolds… just remember to check your sources. There’s a lot of FUD floating around on the Internet right now9.

In summary…

With a reminder that I’m sharing my own opinion here and not that of my employer, here’s my thoughts on the recent WP Engine drama:

  1. WP Engine certainly act in ways that are unethical and immoral and antithetical to the spirit of open source, and those are just a subset of the reasons that I wouldn’t use them as a WordPress host.
  2. Matt Mullenweg calling them out at WordCamp US doesn’t get his point across as well as I think he hoped it might, and probably won’t win him any popularity contests.
  3. I’m not qualified to weigh in on whether or not WP Engine have violated the WordPress Foundation’s trademarks, but I suspect that they’ve benefitted from widespread confusion about their status.

Footnotes

1 I suppose I ought to point out that Automattic is my employer, in case you didn’t know, and point out that my opinions don’t necessarily represent theirs, etc. I’ve been involved with WordPress as an open source project for about four times as long as I’ve had any connection to Automattic, though, and don’t always agree with them, so I’d hope that it’s a given that I’m speaking my own mind!

2 Though like Manu, I don’t think that means that Matt should take the corresponding blog post down: I’m a digital preservationist, as might be evidenced by the unrepresentative-of-me and frankly embarrassing things in the 25-year archives of this blog!

3 Fortunately the documents that the lawyers for both sides have been writing are much clearer and more-specific, but that’s what you pay lawyers for, right?

4 There’s a huge amount of debate about the difference between morality and ethics, but I’m using the definition that means that morality is based on what a social animal might be expected to decide for themselves is right, think e.g. the Golden Rule etc., whereas ethics is the code of conduct expected within a particular community. Take stealing, for example, which covers the spectrum: that you shouldn’t deprive somebody else of something they need, is a moral issue; that we as a society deem such behaviour worthy of exclusion is an ethical one; meanwhile the action of incarcerating burglars is part of our legal framework.

5 Not that nobody’s immune to making ethical mistakes. Not me, not you, not anybody else. I remember when, back in 2005, Matt fucked up by injecting ads into WordPress (which at that point didn’t have a reliable source of funding). But he did the right thing by backpedalling, undoing the harm, and apologising publicly and profusely.

6 WP Engine claim that they disable revisions for performance reasons, but that’s clearly bullshit: it’s pretty obvious to me that this is about making hosting cheaper. Enabling revisions doesn’t have a performance impact on a properly-configured multisite hosting system, and I know this from personal experience of running such things. But it does have a significant impact on how much space you need to allocate to your users, which has cost implications at scale.

7 As an aside: if a court does rule that WP Engine is infringing upon WordPress trademarks and they want a new company name to give their service a fresh start, they’re welcome to TurdPress.

8 I’d argue that it is okay to do so for personal-use though: the difference for me comes when you’re making a profit off of it. It’s interesting to find these edge-cases in my own thinking!

9 A typical Reddit thread is about 25% lies-and-bullshit; but you can double that for a typical thread talking about this topic!

× × × × ×

D20 with Advantage

Dungeons & Dragons players spend a lot of time rolling 20-sided polyhedral dice, known as D20s.

In general, they’re looking to roll as high as possible to successfully stab a wyvern, jump a chasm, pick a lock, charm a Duke1, or whatever.

A 'full set' of white polyhedral dice commonly-used by roleplayers - a D4, D6, D8, two D10s, a D12, and a D20 - sit half-submerged in a red liquid.
Submerging your dice set in the blood of a halfling is a sure-fire way to get luckier rolls.

Roll with advantage

Sometimes, a player gets to roll with advantage. In this case, the player rolls two dice, and takes the higher roll. This really boosts their chances of not-getting a low roll. Do you know by how much?

I dreamed about this very question last night. And then, still in my dream, I came up with the answer2. I woke up thinking about it3 and checked my working.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
3 3 3 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
4 4 4 4 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
5 5 5 5 5 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
6 6 6 6 6 6 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
7 7 7 7 7 7 7 7 8 9 10 11 12 13 14 15 16 17 18 19 20
8 8 8 8 8 8 8 8 8 9 10 11 12 13 14 15 16 17 18 19 20
9 9 9 9 9 9 9 9 9 9 10 11 12 13 14 15 16 17 18 19 20
10 10 10 10 10 10 10 10 10 10 10 11 12 13 14 15 16 17 18 19 20
11 11 11 11 11 11 11 11 11 11 11 11 12 13 14 15 16 17 18 19 20
12 12 12 12 12 12 12 12 12 12 12 12 12 13 14 15 16 17 18 19 20
13 13 13 13 13 13 13 13 13 13 13 13 13 13 14 15 16 17 18 19 20
14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 15 16 17 18 19 20
15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 16 17 18 19 20
16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 17 18 19 20
17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 18 19 20
18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 19 20
19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
Table illustrating the different permutations of two D20 rolls and the “advantage” result (i.e. the higher of the two).

The chance of getting a “natural 1” result on a D20 is 1 in 20… but when you roll with advantage, that goes down to 1 in 400: a huge improvement! The chance of rolling a 10 or 11 (2 in 20 chance of one or the other) remains the same. And the chance of a “crit” –  20 – goes up from 1 in 20 when rolling a single D20 to 39 in 400 – almost 10% – when rolling with advantage.

You can see that in the table above: the headers along the top and left are the natural rolls, the intersections are the resulting values – the higher of the two.

The nice thing about the table above (which again: was how I visualised the question in my dream!) is it really helps to visualise why these numbers are what they are. The general formula for calculating the chance of a given number when rolling D20 with advantage is ( n2 – (n-1)2 ) / 400. That is, the square of the number you’re looking for, minus the square of the number one less than that, over 400 (the total number of permutations)4.

Why roll two dice when one massive one will do?

Knowing the probability matrix, it’s theoretically possible to construct a “D20 with Advantage” die5. Such a tool would have 400 sides (one 1, three 2s, five 3s… and thirty-nine 20s). Rolling-with-advantage would be a single roll.

'400-sided die' shown on Numberwang.
I don’t think anybody’s ever built a real 400-sided die, but Numberwang! claimed to have one.

This is probably a totally academic exercise. The only conceivable reason I can think of would be if you were implementing a computer system on which generating random numbers was computationally-expensive, but memory was cheap: under this circumstance, you could pre-generate a 400-item array of possible results and randomly select from it.

But if anybody’s got a 3D printer capable of making a large tetrahectogon (yes, that’s what you call a 400-sided polygon – you learned something today!), I’d love to see an “Advantage D20” in the flesh. Or if you’d just like to implement a 3D model for Dice Box that’d be fine too!

Footnotes

1 Or throw a fireball, recall an anecdote, navigate a rainforest, survive a poisoning, sneak past a troll, swim through a magical swamp, hold on to a speeding aurochs, disarm a tripwire, fire a crossbow, mix a potion, appeal to one among a pantheon of gods, beat the inn’s landlord at an arm-wrestling match, seduce a duergar guard, persuade a talking squirrel to spy on some bandits, hold open a heavy door, determine the nature of a curse, follow a trail of blood, find a long-lost tome, win a drinking competition, pickpocket a sleeping ogre, bury a magic sword so deep that nobody will ever find it, pilot a spacefaring rowboat, interpret a forgotten language, notice an imminent ambush, telepathically commune with a distant friend, accurately copy-out an ancient manuscript, perform a religious ritual, find the secret button under the wizard’s desk, survive the blistering cold, entertain a gang of street urchins, push through a force field, resist mind control, and then compose a ballad celebrating your adventure.

2 I don’t know what it says about me as a human being that sometimes I dream in mathematics, but it perhaps shouldn’t be surprising given I’m nerdy enough to have previously recorded instances of dreaming in (a) Perl, and (b) Nethack (terminal mode).

3 When I woke up I also found that I had One Jump from Disney’s Aladdin stuck in my head, but I’m not sure that’s relevant to the discussion of probability; however, it might still be a reasonable indicator of my mental state in general.

4 An alternative formula which is easier to read but harder to explain would be ( 2(n – 1) + 1 ) / 400.

5 Or a “D20 with Disadvantage”: the table’s basically the inverse of the advantage one – i.e. 1 in 400 chance of a 20 through to 39 in 400 chance of a 1.

× ×

Idea: Meeting Spoofer

Focus time is great

I’m a big fan of blocking out uninterrupted time on your work calendar for focus activities, even if you don’t have a specific focus task to fill them with.

It can be enough to simple know that, for example, you’ve got a 2-hour slot every Friday morning that you can dedicate to whatever focus-demanding task you’ve got that week, whether it’s a deep debugging session, self-guided training and development activities, or finally finishing that paper that’s just slightly lower priority than everything else on your plate.

Screenshot showing calendar for Thu 2 May and Fri 3 May. The period from 10:30 - 12:30 on the Friday is marked 'Focus Time'.
My work focus time is Friday mornings. It was originally put there so that it immediately followed my approximately-monthly coaching sessions, but it’s remained even since they wandered elsewhere.

I appreciate that my colleagues respect that blocked period: I almost never receive meeting requests in that time. That’s probably because most people, particularly because we’re in such a multi-timezone company, use their calendar’s “find a suitable time for everybody” tool to find the best time for everyone and it sees that I’m “busy” and doesn’t suggest it.

If somebody does schedule a meeting that clashes with that block then, well, it’s probably pretty urgent!

But it turns out this strategy doesn’t work for everybody:

Digital calendar showing a 'focus time - urgent meetings only' block clashing with four other events.
‘Urgent meetings only’ might not mean the same thing to you and I as it does to the not one, not two, not three, but four people who scheduled meetings that clash with it.

My partner recently showed me a portion of her calendar, observing that her scheduled focus time had been overshadowed by four subsequently-created meetings that clashed with it. Four!

Maybe that’s an exception and this particular occasion really did call for a stack of back-to-back urgent meetings. Maybe everything was on fire. But whether or not this particular occasion is representative for my partner, I’ve spoken to other friends who express the same experience: if they block out explicit non-meeting time on their calendar, they get meeting requests for that time anyway. At many employers, “focus time” activities don’t seem to be widely-respected.

Maybe your workplace is the same. The correct solution probably involves a cultural shift: a company-wide declaration in favour of focus time as a valuable productivity tool (which it is), possibly coupled with recommendations about how to schedule them sensitively, e.g. perhaps recommending a couple of periods in which they ought to be scheduled.

But for a moment, let’s consider a different option:

A silly solution?

Does your work culture doesn’t respect scheduled focus time but does respect scheduled meetings? This might seem to be the case in the picture above: note that the meetings that clash with the focus time don’t clash with one another but tessellate nicely. Perhaps you need… fake meetings.

Calendar showing (fake) meetings titled "SN / AFU Project Update", "Team ID107 training session", "Biological Interface Error Scheduling Meeting", and "(Rescheduled) ADIH Planning".
“Wow, what a busy afternoon Dan’s got. I’d better leave him be.”

Of course, creating fake meetings just so you can get some work done is actually creating more work. Wouldn’t it be better if there were some kind of service that could do it for you?

Here’s the idea: a web service that exposes an API endpoint. You start by specifying a few things about the calendar you’d like to fill, for example:

  • What days/times you’d like to fill with “focus time”?
  • What industry you work in, to help making convincing (but generic) event names?
  • Whether you’d like the entire block consistently filled, or occasional small-but-useless gaps of up to 15 minutes inserted between them?

This results in a URL containing those parameters. Accessing that URL yields an iCalendar feed containing those meetings. All you need to do is get your calendar software to subscribe to those events and they’ll appear in your calendar, “filling” your time.

So long as your iCalendar feed subscription refreshes often enough, you could even have an option to enable the events to self-delete e.g. 15 minutes before their start time, so that you don’t panic when your meeting notification pops up right before they “start”!

This is the bit where you’re expecting me to tell you I made a thing

Normally, you’d expect me to pull the covers off some hilarious domain name I’ve chosen and reveal exactly the service I describe, but I’m not doing that today. There’s a few reasons for that:

Week-long calendar filled with empty fake events.
I’m not saying I think the prior art in this area is good, but it’s certainly good-enough.
  1. Firstly, I’ve got enough too many pointless personal/side projects on the go already1. I don’t need another distraction.
  2. Secondly, it turns out others have already done 90% of the work. This open-source project runs locally and fills calendars with (unnamed, private) blocks of varying lengths. This iOS app does almost exactly what I described, albeit in an ad-hoc rather than fully-automated way. There’s no point me just doing the last 10% just to make a joke work.
  3. And thirdly: while I searched for existing tools I discovered a significant number of people who confess online to creating fake meetings in their calendars! While some of these do so for reasons like those I describe – i.e. to block out time and get more work done in an environment that doesn’t respect them simply blocking-out time – a lot of folks admit to doing it just to “look busy”. That could be either the employee slacking off, or perhaps having to work around a manager with a presenteeism/input-measurement based outlook (which is a terrible way to manage people). But either way: it’s a depressing reason to write software.

Nope

So yeah: I’m not going down that avenue.

But maybe if you’re in a field where you’d benefit from it, try blocking out some focus time in your calendar. I think it’s a fantastic idea, and I love that I’m employed somewhere that I can do so and it works out.

Or if you’ve tried that and discovered that your workplace culture doesn’t respect it – if colleagues routinely book meetings into reserved spaces – maybe you should try fake meetings and see if they’re any better-respected. But I’m afraid I can’t help you with that.

× × × ×

Roll Your Own Antispam

Three Rings operates a Web contact form to help people get in touch with us: the idea is that it provides a quick and easy way to reach out if you’re a charity who might be able to make use of the system, a user who’s having difficulty with the features of the software, or maybe a potential new volunteer willing to give your time to the project.

But then the volume of spam it received increased dramatically. We don’t want our support team volunteers to spend all their time categorising spam: even if it doesn’t take long, it’s demoralising. So what could we do?

Clearly-spammy message shown in a ticket management system.
It’s clearly spam, but if it takes you 2 seconds to categorise it and there are 30 in your Inbox, that’s still a drag.

Our conventional antispam tools are configured pretty liberally: we don’t want to reject a contact from a legitimate user just because their message hits lots of scammy keywords (e.g. if a user’s having difficulty logging in and has copy-pasted all of the error messages they received, that can look a lot like a password reset spoofing scam to a spam filter). And we don’t want to add a CAPTCHA, because not only do those create a barrier to humans – while not necessarily reducing spam very much, nowadays – they’re often terrible for accessibility, privacy, or both.

But it didn’t take much analysis to spot some patterns unique to our contact form and the questions it asks that might provide an opportunity. For example, we discovered that spam messages would more-often-than-average:

  • Fill in both the “name” and (optional) “Three Rings username” field with the same value. While it’s cetainly possible for Three Rings users to have a login username that’s identical to their name, it’s very rare. But automated form-fillers seem to disproportionately pair-up these two fields.
  • Fill the phone number field with a known-fake phone number or a non-internationalised phone number from a country in which we currently support no charities. Legitimate non-UK contacts tend to put international-format phone numbers into this optional field, if they fill it at all. Spammers often put NANP (North American Numbering Plan) numbers.
  • Include many links in the body of the message. A few links, especially if they’re to our services (e.g. when people are asking for help) is not-uncommon in legitimate messages. Many links, few of which point to our servers, almost certainly means spam.
  • Choose the first option for the choose -one question “how can we help you?” Of course real humans sometimes pick this option too, but spammers almost always choose it.

None of these characteristics alone, or any of the half dozen or so others we analysed (including invisible checks like honeypots and IP-based geofencing), are reason to suspect a message of being spam. But taken together, they’re almost a sure thing.

To begin with, we assigned scores to each characteristic and automated the tagging of messages in our ticketing system with these scores. At this point, we didn’t do anything to block such messages: we were just collecting data. Over time, this allowed us to find a safe “threshold” score above which a message was certainly spam.

Three Rings contact form filled by Spammy McSpamface, showing a 'Security Checks Failed' error message and tips on refining the message.
Even when a message fails our customised spam checks, we only ‘soft-block’ it: telling the user their message was rejected and providing suggestions on working around that or emailing us conventionally. Our experience shows that the spammers aren’t willing to work to overcome this additional hurdle, but on the very rare ocassion a human hits them, they are.

Once we’d found our threshold we were able to engage a soft-block of submissions that exceeded it, and immediately the volume of spam making it to the ticketing system dropped considerably. Under 70 lines of PHP code (which sadly I can’t share with you) and we reduced our spam rate by over 80% while having, as far as we can see, no impact on the false-positive rate.

Where conventional antispam solutions weren’t quite cutting it, implementing a few rules specific to our particular use-case made all the difference. Sometimes you’ve just got to roll your sleeves up and look at the actual data you do/don’t want, and adapt your filters accordingly.

× ×

2024 in Videogames

Duration

Podcast Version

This post is also available as a podcast. Listen here, download for later, or subscribe wherever you consume podcasts.

My life affords me less time for videogames than it used to, and so my tastes have changed accordingly:

  • I appreciate games that I can drop at a moment’s notice and pick up again some other time, without losing lots of progress1.
  • And if the game can remind me what it was I was trying to achieve when I come back… perhaps weeks or months later… that’s a bonus!
  • I’ve a reduced tolerance for dynamically-generated content (oh, you want me to fetch you another five nirnroot do you? – hard pass2): if I might only get to throw 20 hours total at a game, I’d much prefer to spend that time exploring content deliberately and thoughtfully authored by a human.
  • And, y’know, it has to be fun. I rarely buy games on impulse anymore, and usually wait weeks or months after release dates even for titles I’ve been anticipating, to see what the reviewers make of it.

That said, I’ve played three excellent videogames this year that I’d like to recommend to you (no spoilers):


Horizon: Forbidden West

I loved Horizon: Zero Dawn. Even if this review persuades you that you should play its sequel, Forbidden West, you really oughta play Zero Dawn first3. There’s a direct continuation of plot going on there that you’ll appreciate better that way. Also: Zero Dawn stands alone as a great game in its own right.

Aloy, the protagonist of the Horizon games, wearing Mark of Pride face paint and red-stained Quen Deadeye armour, stands at sunset in a jungle environment.
Horizon gives a lot to love, from a rich world and story, immersive environments, near-seamless loading, excellent voice acting, and a rewarding difficulty curve. But perhaps all are second-place to what a kickass character the protagonist is.

The Horizon series tells the story of Aloy from her childhood onwards, growing up an outcast in a tribal society on a future Earth inhabited by robotic reimaginings of creatures familiar to us today (albeit some of them extinct). Once relatively docile, a mysterious event known as the derangement, shortly before Aloy’s birth, made these machines aggressive and dangerous, leading to a hostile world in which Aloy seeks to prove herself a worthy hunter to the tribe that cast her out.

All of which leads to a series of adventures that gradually explain the nature of the world and how it became that way, and provide a path by which Aloy can perhaps provide a brighter future for humankind. It’s well-written and clever and you’ll fight and die over and over as you learn your way around the countless permutations of weapons, tools, traps, and strategies that you’ll employ. But it’s the kind of learning curve that’s more rewarding than frustrating, and there are so many paths to victory that when I watch Ruth play she uses tactics that I’d never even conceived of.

Aloy aims a precision longbow at a Tremortusk, an elephant-like machine, in a sunny desert environment.
Horizon: Forbidden West is like Zero Dawn but… more. More quests, more exploration, more machines, more characters, and more of the same story, answering questions you might have found yourself thinking during the prequel. But it’s not just more-of-the-same.

Forbidden West is in some ways more-of-the-same, but it outgrows the mould of its predecessor, too. Faced with bigger challenges than she can take on by herself, Aloy comes to assemble a team of trusted party members, and when you’re not out fighting giant robots or spelunking underwater caves or exploring the ruins of ancient San Francisco you’re working alongside them, and that’s one of the places the game really shines. Your associates chatter to each other, grow and change, and each brings something special to the story that invites you to care for each of them as individuals.

The musical score – cinematic in its scope – has been revamped too, and shows off its ability to adapt dynamically to different situations. Face off against one of the terrifying new aquatic enemies and you’ll be treated to a nautical theme, for example. And the formulaic quests of the predecessor (get to the place, climb the thing…), which were already fine, are riddled with new quirks and complexities to keep you thinking.

And finally: I love the game’s commitment to demonstrating the diversity of humanity: both speaking and background characters express a rarely-seen mixture of races, genders, and sexualities, and the story sensitively and compassionately touches on issues of disability, neurodiversity, and transgender identity. It’s more presence than representation (“Hey look, it’s Sappho and her friend!”), but it’s still much better than I’m used to seeing in major video game releases.

Thank Goodness You’re Here!

If ever I need to explain to an American colleague why that one time they visited London does not give them an understanding of what life is like in the North of England… this is the videogame I’ll point them at.

Main menu for Thank Goodness You're Here, featuring options "Gu On Then", "Faff", and "Si' Thi", superimposed on a picture of a street in Barnsley, Yorkshire.
Among the many language options available for the game are “English”, as you’d probably expect, and “Dialect”, which imposes a South Yorkshire accent to everything, as illustrated here by the main menu.

A short, somewhat minigame-driven, absurd to the point of Monty Python-ism, wildly British comedy game, Thank Goodness You’re Here! is a gem. It’s not challenging by any stretch of the imagination, but that only serves to turn focus even more on the weird and wonderful game world of Barnsworth (itself clearly inspired by real-world Barnsley).

Playing a salesman sent to the town to meet the lord mayor, the player ends up stuck with nothing to do4, and takes on a couple of dozen odd-jobs for the inhabitants of the town, meeting a mixed bag of stereotypes and tropes as they go along.

Hand-drawn advertisement for Big Ron's Big Pies (Barnsworth's Best since 1904).
Ahm gowin t’shop to gi’ sumof Big Ron’s Big Pies! Y’wanout, buggerlugs? Players without a grounding in Yorkshire English, and especially non-Brits, might benefit from turning the subtitles on.

Presented in a hand-drawn style that’s as distinctive and bizarre as it is an expression of the effort that must’ve gone into it, this game’s clearly a project of passion for Yorkshire-based developers Coal Supper (yes, that’s really what they call themselves). I particularly enjoyed a recurring joke in which the player is performing some chore (mowing grass for the park keeper, chopping spuds at the chippy) when the scene cuts to some typically-inanimate objects having a conversation (flowers, potatoes) while the player’s actions bring them closer and closer in the background. But it’s hard to pick out a very favourite part from this wonderful, crazy, self-aware slice of Northern life in game form.

Tactical Breach Wizards

Finally, I’ve got to sing the praises of Tactical Breach Wizards by Suspicious Developments (who for some reason don’t bother to list it on their website; the closest thing to an official page for the project other than its Steam entry might be this launch announcement!)5, the team behind Gunpoint and Heat Signature.

The game feels like a cross between XCOM/Xenonauts‘ turn-based tactical combat and Rainbow Six‘s special ops theme. Except instead of a squad of gun-toting body-armoured military/police types, your squad is a team of wizards in a world in which magical combat specialists work alongside conventionally-equipped soldiers on missions where their powers make all the difference.

Jen, the Storm Witch, throws a bolt of lightning through three enemies on a moving train carriage.
Jen the Storm Witch primarily uses large static shocks to fling targets around: relatively harmless, unless she and her teammates have arranged for/tricked enemies to be standing next to something they can be thrown into… or near a window they can be flung out of!

By itself, that could be enough: there’s certainly sufficient differences between all of the powers that the magic users exploit that you’ll find all kinds of ways to combine them. How about having your teleport-capable medic blink themselves to a corner so your witch’s multi-step lightning bolt can use them as a channel to get around a corner and zap a target there? Or what about using the time-manipulation powers of your Navy Seer (yes, really) to give your siege cleric enough actions that they can shield-push your opponent within range of the turret you hacked? And so on.

But Tactical Breach Wizards, which stands somewhere between a tactical squad-based shooter and a deterministic positional puzzle game, goes beyond that by virtue of its storytelling. Despite the limitations of the format, the game manages to pack in a lot of background and personality for every one of your team and even many of the NPCs too (Steve Clark, Traffic Warlock is a riot). Oh, and much of the dialogue is laugh-out-loud funny, to boot.

Three spec ops wizards have a conversation about an upcoming assault.
The dialogue between your teammates – most of it right as they’re about to breach a door – reads like lighthearted banter but exposes the underpinning backstory of the setting.

The writing’s great, to the extent that when I got to the epilogue – interactive segments during the credits where you can influence “what happens next” to each of the characters you’ve come to know – I genuinely flip-flopped on a few of them to give some of them a greater opportunity to continue to feature in one another’s lives. Even though the game was clearly over. It’s that compelling.

And puzzling out some of the tougher levels, especially if you’re going for the advanced (“Confidence”) challenges, too, is really fun. But with autosaves every turn, the opportunity to skip and return to levels that are too challenging, and a within-turn “undo” feature that lets you explore different strategies before you commit to one, this is a great game for someone who, like me, doesn’t have much time to dedicate to play.


So yeah: that’s what I’ve been up to in videogaming-time so far this year. Any suggestions for the autumn/winter?

Footnotes

1 If a game loads quickly that’s a bonus. I still play a little of my favourite variant of the Sid Meier’s Civilization series – that is, Civilization V + Vox Populi (alongside a few quality-of-life mods) but I swear I’d play more of it if it didn’t take so long to load. Even after hacking around it to dodge the launcher, logos, and introduction, my 8P+4E-core i7 processor takes ~80 seconds from clicking to launch the game to having loaded my latest save, which if I’m only going to have time to play three turns is frustratingly long! Contrast Horizon: Forbidden West, which I also mention in this post, a game 13 years younger and with much higher hardware requirements, which takes ~17 seconds to achieve the same. Possibly I’m overanalysing this…

2 This isn’t a criticism of the Elder Scrolls games specifically, but of the relatively-lazy writing that goes into some videogames that feel like they’re using Perchance to come up with their quests, in order to stretch the gameplay. I suppose a better example might have been the on-the-whole disappointment that was Starfield, but I figured an Elder Scrolls reference might be easier to identify at-a-glance. Fetch-questing 100 tonnes of Beryllium just doesn’t have the same ring to it.

3 In fact, if you’re trying to consume the Horizon story as thoroughly as possible and strictly in chronological order, you probably should read the graphic novel between one and the other, which covers some of the events that occur between the two.

4 Did you ever see the alternate ending to Far Cry 4, by the way? If you did, you might appreciate that a similar trick can be used to shortcut Thank Goodness You’re Here! too…

5 They’re also missing a trick by using the domain they’ve registered, wizards.cool, only to redirect to Steam.

× × × × × ×

So… I’m A Podcast

Duration

Podcast Version

This post is also available as a podcast. Listen here, download for later, or subscribe wherever you consume podcasts.

Observant readers might have noticed that some of my recent blog posts – like the one about special roads, my idea for pressure-cooking tea, and the one looking at the history of window tax in two countries1 – are also available as podcast.

Podcast cover showing Dan touching his temple and speaking into a microphone, captioned 'a podcast nobody asked for, about things only Dan Q cares about'.

Why?

Like my occasional video content, this isn’t designed to replace any of my blogging: it’s just a different medium for those that might prefer it.

For some stories, I guess that audio might be a better way to find out what I’ve been thinking about. Just like how the vlog version of my post about my favourite video game Easter Egg might be preferable because video as a medium is better suited to demonstrating a computer game, perhaps audio’s the right medium for some of the things I write about, too?

But as much as not, it’s just a continuation of my efforts to explore different media over which a WordPress blog can be delivered2. Also, y’know, my ongoing effort to do what I’m bad at in the hope that I might get better at a wider diversity of skills.

How?

Let’s start by understanding what a “podcast” actually is. It is, in essence, just an RSS feed (something you might have heard me talk about before…) with audio enclosures – basically, “attachments” – on each item. The idea was spearheaded by Dave Winer back in 2001 as a way of subscribing to rich media like audio or videos in such a way that slow Internet connections could pre-download content so you didn’t have to wait for it to buffer.3

Mapping of wp-admin metadata fields to parts of a podcast feed.
Podcasts are pretty simple, even after you’ve bent over backwards to add all of the metadata that Apple Podcasts (formerly iTunes) expects to see. I looked at a couple of WordPress plugins that claimed to be able to do the work for me, but eventually decided it was simple enough to just add some custom metadata fields that could then be included in my feeds and tweak my theme code a little.

Here’s what I had to do to add podcasting capability to my theme:

The tag

I use a post tag, dancast, to represent posts with accompanying podcast content4. This way, I can add all the podcast-specific metadata only if the user requests the feed of that tag, and leave my regular feeds untampered . This means that you don’t get the podcast enclosures in the regular subscription; that might not be what everybody would want, but it suits me to serve podcasts only to people who explicitly ask for them.

It also means that I’m able to use a template, tag-dancast.php, in my theme to generate a customised page for listing podcast episodes.

The feed

Okay, onto the code (which I’ve open-sourced over here). I’ve use a series of standard WordPress hooks to add the functionality I need. The important bits are:

  1. rss2_item – to add the <enclosure>, <itunes:duration>, <itunes:image>, and <itunes:explicit> elements to the feed, when requesting a feed with my nominated tag. Only <enclosure> is strictly required, but appeasing Apple Podcasts is worthwhile too. These are lifted directly from the post metadata.
  2. the_excerpt_rss – I have another piece of post metadata in which I can add a description of the podcast (in practice, a list of chapter times); this hook swaps out the existing excerpt for my custom one in podcast feeds.
  3. rss_enclosure – some podcast syndication platforms and players can’t cope with RSS feeds in which an item has multiple enclosures, so as a safety precaution I strip out any enclosures that WordPress has already added (e.g. the featured image).
  4. the_content_feed – my RSS feed usually contains the full text of every post, because I don’t like feeds that try to force you to go to the original web page5 and I don’t want to impose that on others. But for the podcast feed, the text content of the post is somewhat redundant so I drop it.
  5. rss2_ns – of critical importance of course is adding the relevant namespaces to your XML declaration. I use the itunes namespace, which provides the widest compatibility for specifying metadata, but I also use the newer podcast namespace, which has growing compatibility and provides some modern features, most of which I don’t use except specifying a license. There’s no harm in supporting both.
  6. rss2_head – here’s where I put in the metadata for the podcast as a whole: license, category, type, and so on. Some of these fields are effectively essential for best support.

You’re welcome, of course, to lift any of all of the code for your own purposes. WordPress makes a perfectly reasonable platform for podcasting-alongside-blogging, in my experience.

What?

Finally, there’s the question of what to podcast about.

My intention is to use podcasting as an alternative medium to my traditional blog posts. But not every blog post is suitable for conversion into a podcast! Ones that rely on images (like my post about dithering) aren’t a great choice. Ones that have lots of code that you might like to copy-and-paste are especially unsuitable.

Dan, a microphone in front of him, smiles at the camera.
You’re listening to Radio Dan. 100% Dan, 100% of the time.(Also I suppose you might be able to hear my dog snoring in the background…)

Also: sometimes I just can’t be bothered. It’s already some level of effort to write a blog post; it’s like an extra 25% effort on top of that to record, edit, and upload a podcast version of it.

That’s not nothing, so I’ve tended to reserve podcasts for blog posts that I think have a sort-of eccentric “general interest” vibe to them. When I learn something new and feel the need to write a thousand words about it… that’s the kind of content that makes it into a podcast episode.

Which is why I’ve been calling the endeavour “a podcast nobody asked for, about things only Dan Q cares about”. I’m capable of getting nerdsniped easily and can quickly find my way down a rabbit hole of learning. My podcast is, I guess, just a way of sharing my passion for trivial deep dives with the rest of the world.

My episodes are probably shorter than most podcasts: my longest so far is around fifteen minutes, but my shortest is only two and a half minutes and most are about seven. They’re meant to be a bite-size alternative to reading a post for people who prefer to put things in their ears than into their eyes.

Anyway: if you’re not listening already, you can subscribe from here or in your favourite podcasting app. Or you can just follow my blog as normal and look for a streamable copy of podcasts at the top of selected posts (like this one!).

Footnotes

1 I’ve also retroactively recorded a few older ones. Have a look/listen!

2 As well as Web-based non-textual content like audio (podcasts) and video (vlogs), my blog is wholly or partially available over a variety of more-exotic protocols: did you find me yet on Gemini (gemini://danq.me/), Spartan (spartan://danq.me/), Gopher (gopher://danq.me/), and even Finger (finger://danq.me/, or run e.g. finger blog@danq.me from your command line)? Most of these are powered by my very own tool CapsulePress, and I’m itching to try a few more… how about a WordPress blog that’s accessible over FTP, NNTP, or DNS? I’m not even kidding when I say I’ve got ideas for these…

3 Nowadays, we have specialised media decoder co-processors which reduce the size of media files. But more-importantly, today’s high-speed always-on Internet connections mean that you probably rarely need to make a conscious choice between streaming or downloading.

4 I actually intended to change the tag to podcast when I went-live, but then I forgot, and now I can’t be bothered to change it. It’s only for my convenience, after all!

5 I’m very grateful that my favourite feed reader makes it possible to, for example, use a CSS selector to specify the page content it should pre-download for you! It means I get to spend more time in my feed reader.

× × ×

Even More 1999!

Spencer’s filter

Last month I implemented an alternative mode to view this website “like it’s 1999”, complete with with cursor trails, 88×31 buttons, tables for layout1, tiled backgrounds, and even a (fake) hit counter.

My blog post about 1999 Mode, viewed using 1999 Mode.
Feels like I’m 17 again.

One thing I’d have liked to do for 1999 Mode but didn’t get around to would have been to make the images look like it was the 90s, too.

Back then, many Web users only had  graphics hardware capable of displaying 256 distinct colours. Across different platforms and operating systems, they weren’t even necessarily the same 256 colours2! But the early Web agreed on a 216-colour palette that all those 8-bit systems could at least approximate pretty well.

I had an idea that I could make my images look “216-colour”-ish by using CSS to apply an SVG filter, but didn’t implement it.

A man wearing a cap pours himself a beer from a 10-litre box.
Let’s use this picture, from yesterday’s blog post, to talk about palettes…

But Spencer, a long-running source of excellent blog comments, stepped up and wrote an SVG filter for me! I’ve tweaked 1999 Mode already to use it… and I’ve just got to say it’s excellent: huge thanks, Spencer!

The filter coerces colours to their nearest colour in the “Web safe” palette, resulting in things like this:

A man wearing a cap pours himself a beer from a 10-litre box, reduced to a "Web safe" palette.
The flat surfaces are particularly impacted in this photo (as manipulated by the CSS SVG filter described above). Subtle hues and the gradients coalesce into slabs of colour, giving them an unnatural and blocky appearance.

Plenty of pictures genuinely looked like that on the Web of the 1990s, especially if you happened to be using a computer only capable of 8-bit colour to view a page built by somebody who hadn’t realised that not everybody would experience 24-bit colour like they did3.

Dithering

But not all images in the “Web safe” palette looked like this, because savvy web developers knew to dither their images when converting them to a limited palette. Let’s have another go:

A man wearing a cap pours himself a beer from a 10-litre box, reduced to a "Web safe" palette but using Floyd Steinberg dithering to reduce the impact of colour banding.
This image uses exactly the same 216-bit colour palette as the previous one, but looks a lot more “natural” thanks to the Floyd–Steinberg dithering algorithm.

Dithering introduces random noise to media4 in order to reduce the likelihood that a “block” will all be rounded to the same value. Instead; in our picture, a block of what would otherwise be the same colour ends up being rounded to maybe half a dozen different colours, clustered together such that the ratio in a given part of the picture is, on average, a better approximation of the correct colour.

The result is analogous to how halftone printing – the aesthetic of old comics and newspapers, with different-sized dots made from few colours of ink – produces the illusion of a continuous gradient of colour so long as you look at it from far-enough away.

Comparison image showing the original, websafe, and dithered-websafe images, zoomed in so that you can see the speckling of random noise in the dithered version.
Zooming in makes it easy to see the noisy “speckling” effect in the dithered version, but from a distance it’s almost invisible.

The other year I read a spectacular article by Surma that explained in a very-approachable way how and why different dithering algorithms produce the results they do. If you’ve any interest whatsoever in a deep dive or just want to know what blue noise is and why you should care, I’d highly recommend it.

You used to see digital dithering everywhere, but nowadays it’s so rare that it leaps out as a revolutionary aesthetic when, for example, it gets used in a video game.

Comparison image showing the image quantized to monochrome without (looks blocky/barely identifiable) and with (looks like old newspaper photography) dithering.
Dithering can be so effective that it can even make an image “work” all the way down to 1-bit (i.e. true monochrome/black-and-white) colour. Here I’ve used Jarvis, Judice & Ninke’s dithering algorithm, which is highly-effective for picking out subtle colour differences in what would otherwise be extreme dark and light patches, at the expense of being more computationally-expensive (to initially create) than other dithering strategies.

All of which is to say that: I really appreciate Spencer’s work to make my “1999 Mode” impose a 216-colour palette on images. But while it’s closer to the truth, it still doesn’t quite reflect what my website would’ve looked like in the 1990s because I made extensive use of dithering when I saved my images in Web safe palettes5.

Why did I take the time to dither my images, back in the day? Because doing the hard work once, as a creator of graphical Web pages, saves time and computation (and can look better!), compared to making every single Web visitor’s browser do it every single time.

Which, now I think about it, is a lesson that’s still true today (I’m talking to you, developers who send a tonne of JavaScript and ask my browser to generate the HTML for you rather than just sending me the HTML in the first place!).

Footnotes

1 Actually, my “1999 mode” doesn’t use tables for layout; it pretty much only applies a CSS overlay, but it’s deliberately designed to look a lot like my blog did in 1999, which did use tables for layout. For those too young to remember: back before CSS gave us the ability to lay out content in diverse ways, it was commonplace to use a table – often with the borders and cell-padding reduced to zero – to achieve things that today would be simple, like putting a menu down the edge of a page or an image alongside some text content. Using tables for non-tabular data causes problems, though: not only is it hard to make a usable responsive website with them, it also reduces the control you have over the order of the content, which upsets some kinds of accessibility technologies. Oh, and it’s semantically-invalid, of course, to describe something as a table if it’s not.

2 Perhaps as few as 22 colours were defined the same across all widespread colour-capable Web systems. At first that sounds bad. Then you remember that 4-bit (16 colour) palettes used to look look perfectly fine in 90s videogames. But then you realise that the specific 22 “very safe” colours are pretty shit and useless for rendering anything that isn’t composed of black, white, bright red, and maybe one of a few greeny-yellows. Ugh. For your amusement, here’s a copy of the image rendered using only the “very safe” 22 colours.

3 Spencer’s SVG filter does pretty-much the same thing as a computer might if asked to render a 24-bit colour image using only 8-bit colour. Simply “rounding” each pixel’s colour to the nearest available colour is a fast operation, even on older hardware and with larger images.

4 Note that I didn’t say “images”: dithering is also used to produce the same “more natural” feel for audio, too, when reducing its bitrate (i.e. reducing the number of finite states into which the waveform can be quantised for digitisation), for example.

5 I’m aware that my footnotes are capable of nerdsniping Spencer, so by writing this there’s a risk that he’ll, y’know, find a way to express a dithering algorithm as an SVG filter too. Which I suspect isn’t possible, but who knows! 😅

× × × × × ×

Easy Socialising

This weekend I invited over a bunch of our old university buddies, and it was great.

We still didn’t feel up to a repeat of the bigger summer party we held the year before last, but we love our Abnib buddies, so put the call out to say: hey, come on over, bring a tent (or be willing to crash on a sofa bed) if you want to stay over; we’ll let the kids run themselves ragged with a water fight and cricket and football and other garden games, then put them in front of a film or two while we hang out and drink and play board games or something.

14 adults, 8 children, and a dog stand on/in front of a garden climbing frame: Dan is in the centre.
Every one of these people is awesome. Or else a dog.

The entire plan was deliberately low-effort. Drinks? We had a local brewery drop us off a couple of kegs, and encouraged people to BYOB. Food? We threw a stack of pre-assembled snacks onto a table, and later in the day I rotated a dozen or so chilled pizzas through the oven. Entertainments? Give the kids a pile of toys and the adults one another’s company.

We didn’t even do more than the bare minimum of tidying up the place before people arrived. Washing-up done? No major trip hazards on the floor? That’s plenty good enough!

A man wearing a cap pours himself a beer from a 10-litre box.
The intersection of “BYOB” and the generosity of our friends somehow meant that, I reckon, we have more alcohol in the house now than before the party!

I found myself recalling our university days, when low-effort ad-hoc socialising seemed… easy. We lived close together and we had uncomplicated schedules, which combined to make it socially-acceptable to “just turn up” into one another’s lives and spaces. Many were the times that people would descend upon Claire and I’s house in anticipation that there’d probably be a film night later, for example1.

I remember one occasion a couple of decades ago, chilling with friends2. Somebody – possibly Liz – commented that it’d be great if in the years to come our kids would be able to be friends with one another. I was reminded of it when our eldest asked me, of our weekend guests, “why are all of your friends’ children are so great?”

A group of adults stand around on a patio, socialising.
It’s not the same as those days long ago, but I’m not sure I’d want it to be. It is, however, fantastic.

What pleased me in particular was how relatively-effortless it was for us all to slip back into casually spending time together. With a group of folks who have, for the most part, all known each other for over two decades, even not seeing one another in-person for a couple of years didn’t make a significant dent on our ability to find joy in each other’s company.

Plus, being composed of such laid-back folks, it didn’t feel awkward that we had, let’s face it, half-arsed the party. Minimal effort was the order of the day, but the flipside of that was that the value-for-effort coefficient was pretty-well optimised3.

A delightful weekend that I was glad to be part of.

Footnotes

1 That Claire and I hosted so many social events, both regular and unplanned, eventually lead us to the point that it was the kind of thing we considered whenever we moved house!

2 Perhaps at the Ship & Castle, where we spent a reasonable amount of our education.

3 I’m pretty sure that if I’d have used the term “value-for-effort coefficient” at the party, though, then it’d have immediately sucked 100% of the fun out of the room.

× × ×

Draw Me a Comment!

Why must a blog comment be text? Why could it not be… a drawing?1

Red and black might be more traditional ladybird colours, but sometimes all you’ve got is blue.

I started hacking about and playing with a few ideas and now, on selected posts including this one, you can draw me a comment instead of typing one.

Just don’t tell the soup company what I’ve been working on, okay?

I opened the feature, experimentally (in a post available only to RSS subscribers2) the other week, but now you get a go! Also, I’ve open-sourced the whole thing, in case you want to pick it apart.

What are you waiting for: scroll down, and draw me a comment!

Footnotes

1 I totally know the reasons that a blog comment shouldn’t be a drawing; I’m not completely oblivious. Firstly, it’s less-expressive: words are versatile and you can do a lot with them. Secondly, it’s higher-bandwidth: images take up more space, take longer to transmit, and that effect compounds when – like me – you’re tracking animation data too. But the single biggest reason, and I can’t stress this enough, is… the penises. If you invite people to draw pictures on your blog, you’re gonna see a lot of penises. Short penises, long penises, fat penises, thin penises. Penises of every shape and size. Some erect and some flacid. Some intact and some circumcised. Some with hairy balls and some shaved. Many of them urinating or ejaculating. Maybe even a few with smiley faces. And short of some kind of image-categorisation AI thing, you can’t realistically run an anti-spam tool to detect hand-drawn penises.

2 I’ve copied a few of my favourites of their drawings below. Don’t forget to subscribe if you want early access to any weird shit I make.

Permanent Record

To:
****@fulwoodacademy.co.uk
From:
“Dan Q” <***@danq.me>
Subject:
Subject Access Request – Dan Q, pupil Sep 1992 – Jun 1997
Date:
Tue, 23 Jul 2024 15:18:07 +0100

To Whom It May Concern,

Please supply the personal data you hold about me, per data protection law. Specifically, I’m looking for: a list of all offences for which I was assigned detention at school.

Please find attached a variety of documentation which I feel proves my identity and the legitimacy of this request. If there’s anything else you need or you have further questions, please feel free to email me.

Thanks in advance;

Dan Q

To:
“Dan Q” <***@danq.me>
From:
“Jodie Clayton” <*.*******@fulwoodacademy.co.uk>
Subject:
Re: Subject Access Request – Dan Q, pupil Sep 1992 – Jun 1997
Date:
Fri, 26 Jul 2024 10:48:33 +0100

Dear Dan Q,

We do not retain records of detentions of former pupils, and we certainly have no academic records of pupils going back thirty years ago.

Fulwood Academy

Jodie Clayton | Office Manager with Cover and Admissions
Black Bull Lane, Fulwood, Preston, PR2 9YR
+44 (0) 1772 719060

To:
“Jodie Clayton” <*********@fulwoodacademy.co.uk>
From:
“Dan Q” <***@danq.me>
Subject:
Re: Subject Access Request – Dan Q, pupil Sep 1992 – Jun 1997
Date:
Fri, 26 Jul 2024 17:00:49 +0100

But, but… I was always told that this would go on my permanent record. Are you telling me that teachers lied to me? What else is fake!?

Maybe I will always have a calculator with me and I won’t actually need to know how to derive a square root using a pen and paper. Maybe nobody will ever care what my GCSE results are for every job I apply for. Maybe my tongue isn’t divided into different taste areas capable of picking out sweet, salty, bitter etc. flavours. Maybe practicing my handwriting won’t be an essential skill I use every day.

And maybe I will amount to something despite never turning in any History homework, Mr. Needham!

Dan Q

Tidying WordPress’s HTML

Terence Eden, who’s apparently inspiring several posts this week, recently shared a way to attach a hook to WordPress’s get_the_post_thumbnail() function in order to remove the extraneous “closing mark” from the (self-closing in HTML) <img> element.

By default, WordPress outputs e.g. <img src="..." />, where <img src="..."> would suffice.

It’s an inconsequential difference for most purposes, but apparently it bugs him, so he fixed it… although he went on to observe that he hadn’t managed to successfully tackle all the instances in which WordPress was outputting redundant closing marks.

This is a problem that I’ve already solved here on my blog. My solution’s slightly hacky… but it works!

Source code for a post on DanQ.me, being searched for unnecessary HTML closing tags. No results are found.
There are many things you could say about the HTML produced to make the page you’re reading now. But “it needs fewer />s” isn’t among them.

My Solution: Runing HTMLTidy over WordPress

Tidy is an excellent tool for tiding up HTML! I used to use its predecessor back in the day for all kind of things, but it languished for a few years and struggled with support for modern HTML features. But in 2015 it made a comeback and it’s gone from strength to strength ever since.

I run it on virtually all pages produced by DanQ.me (go on, click “View Source” and see for yourself!), to:

  • Standardise the style of the HTML code and make it easier for humans to read1.
  • Bring old-style emphasis tags like <i>, in my older posts, into a more-modern interpretation, like <em>.
  • Hoist any inline <style> blocks to the <head>, and detect any repeated inline style="..."s to convert to classes.
  • Repair any invalid HTML (browsers do this for you, of course, but doing it server-side makes parsing easier for the browser, which might matter on more-lightweight hardware).

WordPress isn’t really designed to have Tidy bolted onto it, so anything it likely to be a bit of a hack, but here’s my approach:

  1. Install libtidy-dev and build the PHP bindings to it.
    Note that if you don’t do this the code might appear to work, but it won’t actually tidy anything2.
  2. Add a new output buffer to my theme’s header.php3, with a callback function: ob_start('tidy_entire_page').
    Without an corresponding ob_flush or similar, this buffer will close and the function will be called when PHP finishes generating the page.
  3. Define the function tidy_entire_page($buffer)
    Have it instantiate Tidy ($tidy = new tidy) and use $tidy->parseString (with your buffer and Tidy preferences) to tidy the code, then return $tidy.
  4. Ensure that you’re caching the results!
    You don’t want to run this every page load for anonymous users! WP Super Cache on “Expert” mode (with the requisite webserver configuration) might help.

I’ve open-sourced a demonstration that implements a child theme to TwentyTwentyOne to do this: there’s a richer set of instructions in the repo’s readme. If you want, you can run my example in Docker and see for yourself how it works before you commit to trying to integrate it into your own WordPress installation!

Footnotes

1 I miss the days when most websites were handwritten and View Source typically looked nice. It was great to learn from, too, especially in an age before we had DOM debuggers. Today: I can’t justify dropping my use of a CMS, but I can make my code readable.

2 For a few of its extensions, some PHP developer made the interesting choice to fail silently if the required extension is missing. For example: if you don’t have the zip extension enabled you can still use PHP to make ZIP files, but they won’t be compressed. This can cause a great deal of confusion for developers! A similar issue exists with tidy: if it isn’t installed, you can still call all of the methods on it… they just don’t do anything. I can see why this decision might have been made – to make the language as portable as possible in production – but I’d prefer if this were an optional feature, e.g. you had to set try_to_make_do_if_you_are_missing_an_extension=yes in your php.ini to enable it, or if it at least logged that it had done so.

3 My approach probably isn’t suitable for FSE (“block”) themes, sorry.

×

Shared Email Addresses

Email Antipatterns

There are two particular varieties of email address that I don’t often see, but I’ve been known to ridicule when I have:

  1. Geographically-based personal email addresses, e.g. OurHouseName@example.com. These always seemed to me to undermine one of the single-best things about an email address compared to postal mail – that they don’t change when you move house!1
  2. Shared/couple email addresses, e.g. MrAndMrsSmith@example.net. These make me want to scream “You know email addresses are basically free, right? You don’t have to share one!” Even back when most people got their email address directly from their dial-up provider, most ISPs offered some number of addresses (e.g. five).

If you’ve come across either of the above before, there’s… perhaps a reasonable chance that it was in the possession of somebody born before 1960 (and the older, the more-likely)2.

In Community Season 4, Episode 8 (Herstory of Dance), Pierce Hawthorne (Chevy Chase), wearing an Inspector Spacetime t-shirt, sits in a computer lab, saying "Seriously, I need to get to my email: the Post Office is about to close!"
In Pierce’s defence, “my email is on that computer” did genuinely used to be a thing, before the widespread adoption of IMAP and webmail.

You’ll never catch me doing that!

I found myself thinking about this as I clicked the “No” button on a poll by Terence Eden that asked whether I used a “shared” email address when in a stable long-term relationship.

Terence Eden (@Edent@mastodon.social) on Mastodon asks: "If you're currently in a stable, long term relationship with someone - do you have a joint email address with them?"
Of course I don’t! Why would I? Oh… wait…

It wasn’t until after I clicked “No” that I realised that, in actual fact, I have had multiple email addresses that I’ve share with significant other(s). And more than that, sometimes they’ve been geographically-based! What’s going on?

I’ve routinely had domains or subdomains that I’ve used to represent a place that I live. They’re convenient for when you want to give somebody a short web address which’ll take them to a page with directions to you and links to your location in a variety of different services and formats.

And by that point, you might as well have an email alias, e.g. all@myhouse.example.org, that forwards on email to, well, all the adults at the house. What I’ve described there is, after a fashion, a shared email address tied to a geographical location. But we don’t ever send anything from it. Nor do we use it for any kind of personal communication with anybody outside the house.

Email receipt from Sainsburys, advising that they're unable to deliver "Fruit Bowl Raspberry Peelers 5x16g".
Sainsbury’s aren’t going to bring us any Raspberry Peelers. I’m not sure who ordered them, but I’m confident that it’s the kids who’re gonna complain about it.

We don’t give out these all@ addresses (or their aliases: every company gets their own) to people willy-nilly. But they’re useful for shared services that send automated emails to us all. For example:

  • Giving a forwarding alias to the supermarket means that receipts (listing any unavailable products) g0 to all of us, and whoever’s meal plan’s been scuppered by an awkward substitution will know what’s up.
  • Using a forwarding alias with the household Netflix account means anybody can use the “send me a sign-in link” feature to connect a new device.
  • When confirming that you’ve sent money to a service provider, CC’ing one of these nice, short aliases provides a quick way to let the others know that a bill’s been paid (this one’s especially useful where, like me, you live in a 3+ adult household and otherwise you’d be having to add multiple people to the CC field).

Sure, the need for most of these solutions would evaporate instantly if more services supported multi-user or delegated access3. But outside of that fantasy world, shared aliases seem to be pretty useful!

Footnotes

1 The most ill-conceived example of geographically-based email addresses I’ve ever seen came from a a 2003 proposal by then-MP Derek Wyatt, who proposed that the domain name part of every single email address should contain not only the country of the owner (e.g. .uk) but also their complete postcode. He was under the delusion that this would somehow prevent spam. Even ignoring the immense technical challenges of his proposal and the impossibility of policing it across the borders of every country that uses email… it probably wouldn’t even be effective at his stated goal. I’ll let The Register take it from here.

2 No ageism intended: I suspect that the phenomenon actually stems from the fact that as email took off in the noughties this demographic who were significantly more-likely than younger folks to have (a) a very long-term home that they didn’t anticipate moving out of any time soon, and (b) an existing anticipation that people and companies wrote to them as a couple, not individually.

3 I’d love it if the grocery delivery sites would let multiple “accounts”, by mutual consent, share a delivery slot, destination, and payment method. It’d be cool to know that we could e.g. have a houseguest and give them temporary access to a specific order that was scheduled for during their stay. But that’s probably a lot of work for very little payoff if you’re busy running a supermarket.

× × ×

Blogging Like It’s 1999

In anticipation of WWW Day on 1 August, some work colleagues and I were sharing pictures of the first (or early) websites we worked on. I was pleased to be able to pull out a screenshot of how my blog looked back in 1999!

Opera 3.62 on Windows 3.11 viewing the 1999 version of Dan's blog.
Tables for layout, hit counter, web-safe colour scheme, and the need to explain what a “navigation bar” is in case they’ve not come across one before. Yup, this is 90s web design at its peak and no mistake.

Because I’m such a digital preservationist, many of those ancient posts are still available on my blog, so I also shared a photo of me browsing the same content on my blog as it is today, side-by-side with that 25+-year-old screenshot.1

Dan poses in front of circa 1999 and present-day copies of his blog, both showing posts from January 1999.
The posts are in reverse-chronological order now, rather than chronological order, but the content’s all the same (even though the design is now very different and, of course, responsive!).

Update: This photo eventually appeared on a LinkedIn post on Automattic’s profile.

This inspired me to make a toggleable “alternate theme” for my blog: 1999 Mode.

Switch to it, and you’ll see a modern reinterpretation of my late-1990s blog theme, featuring:

  • A “table-like” layout.2
  • White text on a black marbled “seamless texture” background, just like you’d expect on any GeoCities page.
  • Pre-rendered fire text3, including – of course – animated GIFs.4
  • A (fake) hit counter.
  • A stack of 88×31 micro-banners, as was all the rage at the time. (And seem to be making a comeback in IndieWeb circles…)
  • Cursor trails (with thanks to Tim Holman)!
  • I’ve even applied img { image-rendering: crisp-edges; } to try to compensate for modern browsers’ capability for subpixel rendering when rescaling images: let them eat pixels!5
This blog post, viewed using 1999 Mode.
Or if you can’t be bothered to switch to 1999 Mode, you can just look at this screenshot to get an idea of how it looks.

I’ve added 1999 Mode to my April Fools gags so, like this year, if you happen to visit my site on or around 1 April, there’s a change you’ll see it in 1999 mode anyway. What fun!

I think there’s a possible future blog post about Web design challenges of the 1990s. Things like: what it the user agent doesn’t support images? What if it supports GIFs, but not animated ones (some browsers would just show the first frame, so you’d want to choose your first frame appropriately)? How do I ensure that people see the right content if they skip my frameset? Which browser-specific features can I safely use, and where do I need a fallback6? Will this work well on all resolutions down to 640×480 (minus browser chrome)? And so on.

Any interest in that particular rabbit hole of digital history?

Footnotes

1 Some of the addresses have changed, but from Summer 2003 onwards I’ve had a solid chain of redirects in place to try to keep content available via whatever address it was at. Because Cool URIs Don’t Change. This occasionally turns out to be useful!

2 Actually, the entire theme is just a CSS change, so no tables are added. But I’ve tried to make it look like I’m using tables for layout, because that (and spacer GIFs) were all we had back in the day.

3 Obviously the title saying “Dan Q” is modern, because that wasn’t even my name back then, but this is more a reimagining of how my site would have looked if I were transported back to 1999 and made to do it all again.

4 I was slightly obsessed for a couple of years in the late 90s with flaming text on black marble backgrounds. The hit counter in my screenshot above – with numbers on fire – was one I made, not a third-party one; and because mine was the only one of my friends’ hosts that would let me run CGIs, my Perl script powered the hit counters for most of my friends’ sites too.

5 I considered, but couldn’t be bothered, implementing an SVG CSS filter: to posterize my images down to 8-bit colour, for that real “I’m on an old graphics card” feel! If anybody’s already implemented such a thing under a license that I can use, let me know and I’ll integrate it!

6 And what about those times where using a fallback might make things worse, like how Netscape 7 made the <blink><marquee> combination unbearable!

× × ×

The Elegance of the ASCII Table

Duration

Podcast Version

This post is also available as a podcast. Listen here, download for later, or subscribe wherever you consume podcasts.

If you’ve been a programmer or programming-adjacent nerd1 for a while, you’ll have doubtless come across an ASCII table.

An ASCII table is useful. But did you know it’s also beautiful and elegant.

Frames from the scene in The Martian where Mark Watney discovers Beth Johanssen's ASCII table.
Even non-programmer-adjacent nerds may have a cultural awareness of ASCII thanks to books and films like The Martian2.
ASCII‘s still very-much around; even if you’re transmitting modern Unicode3 the most-popular encoding format UTF-8 is specifically-designed to be backwards-compatible with ASCII! If you decoded this page as ASCII you’d get the gist of it… so long as you ignored the garbage characters at the end of this sentence! 😁

History

ASCII was initially standardised in X3.4-1963 (which just rolls off the tongue, doesn’t it?) which assigned meanings to 100 of the potential 128 codepoints presented by a 7-bit4 binary representation: that is, binary values 0000000 through 1111111:

Scan of a X3.4-1963 ASCII table.
Notably absent characters in this first implementation include… the entire lowercase alphabet! There’s also a few quirks that modern ASCII fans might spot, like the curious “up” and “left” arrows at the bottom of column 101____ and the ACK and ESC control codes in column 111____.

If you’ve already guessed where I’m going with this, you might be interested to look at the X3.4-1963 table and see that yes, many of the same elegant design choices I’ll be talking about later already existed back in 1963. That’s really cool!

Table

In case you’re not yet intimately familiar with it, let’s take a look at an ASCII table. I’ve colour-coded some of the bits I think are most-beautiful:

ASCII table with Decimal, Hex, and Character columns.That table only shows decimal and hexadecimal values for each character, but we’re going to need some binary too, to really appreciate some of the things that make ASCII sublime and clever.

Control codes

The first 32 “characters” (and, arguably, the final one) aren’t things that you can see, but commands sent between machines to provide additional instructions. You might be familiar with carriage return (0D) and line feed (0A) which mean “go back to the beginning of this line” and “advance to the next line”, respectively5. Many of the others don’t see widespread use any more – they were designed for very different kinds of computer systems than we routinely use today – but they’re all still there.

32 is a power of two, which means that you’d rightly expect these control codes to mathematically share a particular “pattern” in their binary representation with one another, distinct from the rest of the table. And they do! All of the control codes follow the pattern 00_____: that is, they begin with two zeroes. So when you’re reading 7-bit ASCII6, if it starts with 00, it’s a non-printing character. Otherwise it’s a printing character.

Not only does this pattern make it easy for humans to read (and, with it, makes the code less-arbitrary and more-beautiful); it also helps if you’re an ancient slow computer system comparing one bit of information at a time. In this case, you can use a decision tree to make shortcuts.

Two rolls of punched paper tape.
That there’s one exception in the control codes: DEL is the last character in the table, represented by the binary number 1111111. This is a historical throwback to paper tape, where the keyboard would punch some permutation of seven holes to represent the ones and zeros of each character. You can’t delete holes once they’ve been punched, so the only way to mark a character as invalid was to rewind the tape and punch out all the holes in that position: i.e. all 1s.

Space

The first printing character is space; it’s an invisible character, but it’s still one that has meaning to humans, so it’s not a control character (this sounds obvious today, but it was actually the source of some semantic argument when the ASCII standard was first being discussed).

Putting it numerically before any other printing character was a very carefully-considered and deliberate choice. The reason: sorting. For a computer to sort a list (of files, strings, or whatever) it’s easiest if it can do so numerically, using the same character conversion table as it uses for all other purposes7. The space character must naturally come before other characters, or else John Smith won’t appear before Johnny Five in a computer-sorted list as you’d expect him to.

Being the first printing character, space also enjoys a beautiful and memorable binary representation that a human can easily recognise: 0100000.

Numbers

The position of the Arabic numbers 0-9 is no coincidence, either. Their position means that they start with zero at the nice round binary value 0110000 (and similarly round hex value 30) and continue sequentially, giving:

Binary Hex Decimal digit (character)
011 0000 30 0
011 0001 31 1
011 0010 32 2
011 0011 33 3
011 0100 34 4
011 0101 35 5
011 0110 36 6
011 0111 37 7
011 1000 38 8
011 1001 39 9

The last four digits of the binary are a representation of the value of the decimal digit depicted. And the last digit of the hexadecimal representation is the decimal digit. That’s just brilliant!

If you’re using this post as a way to teach yourself to “read” binary-formatted ASCII in your head, the rule to take away here is: if it begins 011, treat the remainder as a binary representation of an actual number. You’ll probably be right: if the number you get is above 9, it’s probably some kind of punctuation instead.

Shifted Numbers

Subtract 0010000 from each of the numbers and you get the shifted numbers. The first one’s occupied by the space character already, which is a shame, but for the rest of them, the characters are what you get if you press the shift key and that number key at the same time.

“No it’s not!” I hear you cry. Okay, you’re probably right. I’m using a 105-key ISO/UK QWERTY keyboard and… only four of the nine digits 1-9 have their shifted variants properly represented in ASCII.

That, I’m afraid, is because ASCII was based not on modern computer keyboards but on the shifted positions of a Remington No. 2 mechanical typewriter – whose shifted layout was the closest compromise we could find as a standard at the time, I imagine. But hey, you got to learn something about typewriters today, if that’s any consolation.

A Remington Portable No. 3 typewriter.
Bonus fun fact: early mechanical typewriters omitted a number 1: it was expected that you’d use the letter I. That’s fine for printed work, but not much help for computer-readable data.

Letters

Like the numbers, the letters get a pattern. After the @-symbol at 1000000, the uppercase letters all begin 10, followed by the binary representation of their position in the alphabet. 1 = A = 1000001, 2 = B = 1000010, and so on up to 26 = Z = 1011010. If you can learn the numbers of the positions of the letters in the alphabet, and you can count in binary, you now know enough to be able to read any ASCII uppercase letter that’s been encoded as binary8.

And once you know the uppercase letters, the lowercase ones are easy too. Their position in the table means that they’re all exactly 0100000 higher than the uppercase variants; i.e. all the lowercase letters begin 11! 1 = a = 1100001, 2 = b = 1100010, and 26 = z = 1111010.

If you’re wondering why the uppercase letters come first, the answer again is sorting: also the fact that the first implementation of ASCII, which we saw above, was put together before it was certain that computer systems would need separate character codes for upper and lowercase letters (you could conceive of an alternative implementation that instead sent control codes to instruct the recipient to switch case, for example). Given the ways in which the technology is now used, I’m glad they eventually made the decision they did.

Beauty

There’s a strange and subtle charm to ASCII. Given that we all use it (or things derived from it) literally all the time in our modern lives and our everyday devices, it’s easy to think of it as just some arbitrary encoding.

But the choices made in deciding what streams of ones and zeroes would represent which characters expose a refined logic. It’s aesthetically pleasing, and littered with historical artefacts that teach us a hidden history of computing. And it’s built atop patterns that are sufficiently sophisticated to facilitate powerful processing while being coherent enough for a human to memorise, learn, and understand.

Footnotes

1 Programming-adjacent? Yeah. For example, geocachers who’ve ever had to decode a puzzle-geocache where the coordinates were presented in binary (by which I mean: a binary representation of ASCII) are “programming-adjacent nerds” for the purposes of this discussion.

2 In both the book and the film, Mark Watney divides a circle around the recovered Pathfinder lander into segments corresponding to hexadecimal digits 0 through F to allow the rotation of its camera (by operators on Earth) to transmit pairs of 4-bit words. Two 4-bit words makes an 8-bit byte that he can decode as ASCII, thereby effecting a means to re-establish communication with Earth.

3 Y’know, so that you can type all those emoji you love so much.

4 ASCII is often thought of as an 8-bit code, but it’s not: it’s 7-bit. That’s why virtually every ASCII message you see starts every octet with a zero. 8-bits is a convenient number for transmission purposes (thanks mostly to being a power of two), but early 8-bit systems would be far more-likely to use the 8th bit as a parity check, to help detect transmission errors. Of course, there’s also nothing to say you can’t just transmit a stream of 7-bit characters back to back!

5 Back when data was sent to teletype printers these two characters had a distinct different meaning, and sometimes they were so slow at returning their heads to the left-hand-side of the paper that you’d also need to send a few null bytes e.g. 0D 0A 00 00 00 00 to make sure that the print head had gotten settled into the right place before you sent more data: printers didn’t have memory buffers at this point! For compatibility with teletypes, early minicomputers followed the same carriage return plus line feed convention, even when outputting text to screens. Then to maintain backwards compatibility with those systems, the next generation of computers would also use both a carriage return and a line feed character to mean “next line”. And so, in the modern day, many computer systems (including Windows most of the time, and many Internet protocols) still continue to use the combination of a carriage return and a line feed character every time they want to say “next line”; a redundancy build for a chain of backwards-compatibility that ceased to be relevant decades ago but which remains with us forever as part of our digital heritage.

6 Got 8 binary digits in front of you? The first digit is probably zero. Drop it. Now you’ve got 7-bit ASCII. Sorted.

7 I’m hugely grateful to section 13.8 of Coded Character Sets, History and Development by Charles E. Mackenzie (1980), the entire text of which is available freely online, for helping me to understand the importance of the position of the space character within the ASCII character set. While most of what I’ve written in this blog post were things I already knew, I’d never fully grasped its significance of the space character’s location until today!

8 I’m sure you know this already, but in case you’re one of today’s lucky 10,000 to discover that the reason we call the majuscule and minuscule letters “uppercase” and “lowercase”, respectively, dates to 19th century printing, when moveable type would be stored in a box (a “type case”) corresponding to its character type. The “upper” case was where the capital letters would typically be stored.

× × × × ×