What if Emails were Multilingual?

Multilingual emails

Back when I was a student in Aberystwyth, I used to receive a lot of bilingual emails from the University and its departments1. I was reminded of this when I received an email this week from CACert, delivered in both English and German.

Top part of an email from CACert, which begins with the text "German translation below / Deutsche Uebersetzung weiter unten".
Simply putting one language after the other isn’t terribly exciting. Although to be fair, the content of this email wasn’t terribly exciting either.

Wouldn’t it be great if there were some kind of standard for multilingual emails? Your email client or device would maintain an “order of preference” of the languages that you speak, and you’d automatically be shown the content in those languages, starting with the one you’re most-fluent in and working down.

The Web’s already got this functionality2, and people have been sending multilingual emails for much longer than they’ve been developing multilingual websites3!

Enter RFC8255!

It turns out that this is a (theoretically) solved problem. RFC8255 defines a mechanism for breaking an email into multiple different languages in a way that a machine can understand and that ought to be backwards-compatible (so people whose email software doesn’t support it yet can still “get by”). Here’s how it works:

  1. You add a Content-Type: multipart/multilingual header with a defined boundary marker, just like you would for any other email with multiple “parts” (e.g. with a HTML and a plain text version, or with text content and an attachment).
  2. The first section is just a text/plain (or similar) part, containing e.g. some text to explain that this is a multilingual email, and if you’re seeing this then your email client probably doesn’t support them, but you should just be able to scroll down (or else look at the attachments) to find content in the language you read.
  3. Subsequent sections have:
    • Content-Disposition: inline, so that for most people using non-compliant email software they can just scroll down until they find a language they can read,
    • Content-Type: message/rfc822, so that an entire message can be embedded (which allows other headers, like the Subject:, to be translated too),
    • a Content-Language: header, specifying the ISO code of the language represented in that section, and
    • optionally, a Content-Translation-Type: header, specifying either original (this is the original text), human (this was translated by a human), or automated (this was the result of machine translation) – this could be used to let a user say e.g. that they’d prefer a human translation to an automated one, given the choice between two second languages.

Let’s see a sample email:

Content-Type: multipart/multilingual;
 boundary=10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
To: <b24571@danq.me>
From: <rfc8255test-noreply@danq.link>
Subject: Does your email client support RFC8255?
Mime-Version: 1.0
Date: Fri, 27 Sep 2024 10:06:56 +0000

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=utf-8

This is a multipart message in multiple languages. Each part says the
same thing but in a different language. If your email client supports
RFC8255, you will see this message in your preferred language out of
those available. Otherwise, you will probably see each language after
one another or else each language in a separate attachment.

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
Content-Disposition: inline
Content-Type: message/rfc822
Content-Language: en
Content-Translation-Type: original

Subject: Does your email client support RFC8255?
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0

RFC8255 is a standard for sending email in multiple languages. This
is the original email in English. It is embedded alongside the same
content in a number of other languages.

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
Content-Disposition: inline
Content-Type: message/rfc822
Content-Language: fr
Content-Translation-Type: automated

Subject: Votre client de messagerie prend-il en charge la norme RFC8255?
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0

RFC8255 est une norme permettant d'envoyer des courriers
électroniques dans plusieurs langues. Le présent est le courriel
traduit en français. Il est intégré à côté du même contenu contenu
dans un certain nombre d'autres langues.

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664--
Why not copy-paste this into a raw email and see how your favourite email client handles it! That’ll be fun, right?

Can I use it?

That proposed standard turns seven years old next month. Sooo… can we start using it?4

Turns out… not so much. I discovered that NeoMutt supports it:

NeoMutt’s implementation is basic, but it works: you can specify a preference order for languages and it respects it, and if you don’t then it shows all of the languages as a series of attachments. It can apparently even be used to author compliant multilingual emails, although I didn’t get around to trying that.

Support in other clients is… variable.

A reasonable number of them don’t understand the multilingual directives but still show the email in a way that doesn’t suck:

Screenshot from Thunderbird, showing each language one after the other.
Mozilla Thunderbird does a respectable job of showing each language’s subject and content, one after another.

Some shoot for the stars but blow up on the launch pad:

Screenshot from GMail, showing each language one after the other, but with a stack of extra headers and an offer to translate it to English for me (even though the English is already there).
GMail displays all the content, but it pretends that the alternate versions are forwarded messages and adds a stack of meaningless blank headers to each. And then offers to translate the result for you, even though the content is already right there in English.

Others still seem to be actively trying to make life harder for you:

ProtonMail’s Web interface shows only the fallback content, putting the remainder into .eml attachments… which is then won’t display, forcing you to download them and find some other email client to look at them in!5

And still others just shit the bed at the idea that you might read an email like this one:

Screenshot from Outlook 365, showing the message "This message might have been moved or deleted".
Outlook 365 does appallingly badly, showing the subject in the title bar, then the words “(No subject)”, then the message “This message might have been removed or deleted”. Just great.

That’s just the clients I’ve tested, but I can’t imagine that others are much different. If you give it a go yourself with something I’ve not tried, then let me know!

I guess this means that standardised multilingual emails might be forever resigned to the “nice to have but it never took off so we went in a different direction” corner of the Internet, along with the <keygen> HTML element and the concept of privacy.

Footnotes

1 I didn’t receive quite as much bilingual email as you might expect, given that the University committed to delivering most of its correspondence in both English and Welsh. But I received a lot more than I do nowadays, for example

2 Although you might not guess it, given how many websites completely ignore your Accept-Language header, even where it’s provided, and simply try to “guess” what language you want using IP geolocation or something, and then require that you find whatever shitty bit of UI they’ve hidden their language selector behind if you want to change it, storing the result in a cookie so it inevitably gets lost and has to be set again the next time you visit.

3 I suppose that if you were sending HTML emails then you might use the lang="..." attribute to mark up different parts of the message as being in different languages. But that doesn’t solve all of the problems, and introduces a couple of fresh ones.

4 If it were a cool new CSS feature, you can guarantee that it’d be supported by every major browser (except probably Safari) by now. But email doesn’t get so much love as the Web, sadly.

5 Worse yet, if you’re using ProtonMail with a third-party client, ProtonMail screws up RFC8255 emails so badly that they don’t even work properly in e.g. NeoMutt any more! ProtonMail swaps the multipart/multilingual content type for multipart/mixed and strips the Content-Language: headers, making the entire email objectively less-useful.

× × × ×

link rel=”blogroll”

Dave Winer kindly let me know about a proposed standard for linking to OPML blogrolls. Given that I added a page containing my blogroll last year, it was easy enough for me to add a tiny bit of code to the header to add support for automatic detection of my blogroll.

<link rel="blogroll" type="text/xml" href="/blogroll.xml" title="Dan Q's blogroll">

Now all we need is some tools that can do such detection!

(You’ll note I’ve added a title attribute: as I discovered the other day, some browsers including ELinks will show all <link>s of unknown rel="..." at the top of the page and I wanted this one to make sense!)

RSS > ActivityPub

RSS is better than ActivityPub1.

Photograph of a boxing match, but with the heads of the competitors replaced with the ActivityPub and RSS logos (and "AP" or "RSS" written on their clothes, respectively). RSS is delivering a powerful uppercut to ActivityPub.
A devastating blow by RSS against a competitor 19 years his junior! For updates on this bout as it develops, don’t forget to subscribe… using either protocol.

When I subscribe to content, I want:

  • Resilient failsafes. ActivityPub has many points-of-failure. A notification might fail to complete transmission as a result of downtime, faults, or network conditions, and the receiving server might never know. A feed reader, conversely, can tell you that an address 404’d or the server was down.
  • Retroactive access. Once you fix the problem above… you still don’t get the message you missed: it’s probably gone forever – there’s no retroactive access. The same is true when your ActivityPub server connects with a peer for the first time: you only ever get new content after that point. RSS, on the other hand, provides some number of “recent” items the moment you first subscribe.
  • Simple subscriptions. RSS can be served from a statically-hosted single file, which makes it suitable to deploy anywhere as well as consume using anything. It can be read, after a fashion, in anything from Lynx upwards.

RSS ticks all these boxes. If I can choose between RSS and ActivityPub to subscribe to your content, and I don’t need a real-time update, I’m probably going to choose RSS.

About a month later, Matthias Pfefferle wrote a great post that makes a good “next stop” if you’re on a deep dive…

Footnotes

1 I feel like this statement needs a few clarifications and caveats, but my hot take looks spicier if I bury them in a footnote!

  • By RSS, I mean whichever pull-based basic HTTP you like, be that Atom, JSON Feed, h-entry, or even just properly-marked-up HTML5: did you know that the <article> element is intended to be suitable for syndication use?
  • Obviously I appreciate that RSS and ActivityPub are different tools for different jobs, and there are doubtless use-cases for which ActivityPub is clearly the superior solution.
  • I certainly don’t object to services providing both RSS and ActivityPub as syndication options, like Mastodon does, where both might be good choices.
×

Netscape’s Untold Webstories

I mentioned yesterday that during Bloganuary I’d put non-Bloganuary-prompt post ideas onto the backburner, and considered extending my daily streak by posting them in February. Here’s part of my attempt to do that:

Let’s take a trip into the Web of yesteryear, with thanks to our friends at the Internet Archive’s WayBack Machine.

The page we’re interested in used to live at http://www.netscape.com/comprod/columns/webstories/index.html, and promised to be a showcase for best practice in Web development. Back in October 1996, it looked like this:

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to debut in November."

The page is a placeholder for Netscape Webstories (or Web Site Stories, in some places). It’s part of a digital magazine called Netscape Columns which published pieces written by Marc Andreeson, Jim Barksdale, and other bigwigs in the hugely-influential pre-AOL-acquisition Netscape Communications.

This new series would showcase best practice in designing and building Web sites1, giving a voice to the technical folks best-placed to speak on that topic. That sounds cool!

Those white boxes above and below the paragraph of text aren’t missing images, by the way: they’re horizontal rules, using the little-known size attribute to specify a thickness of <hr size=4>!2

Certainly you’re excited by this new column and you’ll come back in November 1996, right?

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in January."

Oh. The launch has been delayed, I guess. Now it’s coming in January.

The <hr>s look better now their size has been reduced, though, so clearly somebody’s paying attention to the page. But let’s take a moment and look at that page title. If you grew up writing web pages in the modern web, you might anticipate that it’s coded something like this:

<h2 style="font-variant: small-caps; text-align: center;">Coming Soon</h2>

There’s plenty of other ways to get that same effect. Perhaps you prefer font-feature-settings: 'smcp' in your chosen font; that’s perfectly valid. Maybe you’d use margin: 0 auto or something to centre it: I won’t judge.

But no, that’s not how this works. The actual code for that page title is:

<center>
  <h2>
    <font size="+3">C</font>OMING
    <font size="+3">S</font>OON
  </h2>
</center>

Back when this page was authored, we didn’t have CSS3. The only styling elements were woven right in amongst the semantic elements of a page4. It was simple to understand and easy to learn… but it was a total mess5.

Anyway, let’s come back in January 1997 and see what this feature looks like when it’s up-and-running.

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in the spring."

Nope, now it’s pushed back to “the spring”.

Under Construction pages were all the rage back in the nineties. Everybody had one (or several), usually adorned with one or more of about a thousand different animated GIFs for that purpose.6

Rotating animated "under construction" banner.

Building “in public” was an act of commitment, a statement of intent, and an act of acceptance of the incompleteness of a digital garden. They’re sort-of coming back into fashion in the interpersonal Web, with the “garden and stream” metaphor7 taking root. This isn’t anything new, of course – Mark Bernstein touched on the concepts in 1998 – but it’s not something that I can ever see returning to the “serious” modern corporate Web: but if you’ve seen a genuine, non-ironic “under construction” page published to a non-root page of a company’s website within the last decade, please let me know!

Under construction banner with an animated yellow-and-black tape banner between two "men at work" signs.

RSS doesn’t exist yet (although here’s a fun fact: the very first version of RSS came out of Netscape!). We’re just going to have to bookmark the page and check back later in the year, I guess…

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page identical to the previous version but with a search box ("To search the Netscape Columns, type a word or phrase here:") beneath.

Okay, so February clearly isn’t Spring, but they’ve updated the page… to add a search form.

It’s a genuine <form> tag, too, not one of those old-fashioned <isindex> tags you’d still sometimes find even as late as 1997. Interestingly, it specifies enctype="application/x-www-form-urlencoded". Today’s developers probably don’t think about the enctype attribute except when they’re doing a form that handles file uploads and they know they need to switch it to enctype="multipart/form-data", (or their framework does this automatically for them!).

But these aren’t the only options, and some older browsers at this time still defaulted to enctype="text/plain".  So long as you’re using a POST and not GET method, the distinction is mostly academic, but if your backend CGI program anticipates that special characters will come-in encoded, back then you’d be wise to specify that you wanted URL-encoding or you might get a nasty surprise when somebody turns up using LMB or something equally-exotic.

Anyway, let’s come back in June. The content must surely be up by now:

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in August."

Oh come on! Now we’re waiting until August?

At least the page isn’t abandoned. Somebody’s coming back and editing it from time to time to let us know about the still-ongoing series of delays. And that’s not a trivial task: this isn’t a CMS. They’re probably editing the .html file itself in their favourite text editor, then putting the appropriate file:// address into their copy of Netscape Navigator (and maybe other browsers) to test it, then uploading the file – probably using FTP – to the webserver… all the while thanking their lucky stars that they’ve only got the one page they need to change.

We didn’t have scripting languages like PHP yet, you see8. We didn’t really have static site generators. Most servers didn’t implement server-side includes. So if you had to make a change to every page on a site, for example editing the main navigation menu, you’d probably have to open and edit dozens or even hundreds of pages. Little wonder that framesets caught on, despite their (many) faults, with their ability to render your navigation separately from your page content.

Okay, let’s come back in August I guess:

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in the spring." Again.

Now we’re told that we’re to come back… in the Spring again? That could mean Spring 1998, I suppose… or it could just be that somebody accidentally re-uploaded an old copy of the page.

Hey: the footer’s gone too? This is clearly a partial re-upload: somebody realised they were accidentally overwriting the page with the previous-but-one version, hit “cancel” in their FTP client (or yanked the cable out of the wall), and assumed that they’d successfully stopped the upload before any damage was done.

They had not.

Screenshot of a Windows 95 dialog box, asking "Are you sure you want to delete index.html?" The cursor hovers over the "Yes" button.

I didn’t mention that top menu, did I? It looks like it’s a series of links, styled to look like flat buttons, right? But you know that’s not possible because you can’t rely on having the right fonts available: plus you’d have to do some <table> trickery to lay it out, at which point you’d struggle to ensure that the menu was the same width as the banner above it. So how did they do it?

The menu is what’s known as a client-side imagemap. Here’s what the code looks like:

<a href="/comprod/columns/images/nav.map">
  <img src="/comprod/columns/images/websitestories_ban.gif" width=468 height=32 border=0 usemap="#maintopmap" ismap>
</a><map name="mainmap">
  <area coords="0,1,92,24" href="/comprod/columns/mainthing/index.html">
  <area coords="94,1,187,24" href="/comprod/columns/techvision/index.html">
  <area coords="189,1,278,24" href="/comprod/columns/webstories/index.html">
  <area coords="280,1,373,24" href="/comprod/columns/intranet/index.html">
  <area coords="375,1,467,24" href="/comprod/columns/newsgroup/index.html">
</map>

The image (which specifies border=0 because back then the default behaviour for graphical browser was to put a thick border around images within hyperlinks) says usemap="#maintopmap" to cross-reference the <map> below it, which defines rectangular areas on the image and where they link to, if you click them! This ingenious and popular approach meant that you could transmit a single image – saving on HTTP round-trips, which were relatively time-consuming before widespread adoption of HTTP/1.1‘s persistent connections – along with a little metadata to indicate which pixels linked to which pages.

The ismap attribute is provided as a fallback for browsers that didn’t yet support client-side image maps but did support server-side image maps: there were a few! When you put ismap on an image within a hyperlink, then when the image is clicked on the href has appended to it a query parameter of the form ?123,456, where those digits refer to the horizontal and vertical coordinates, from the top-left, of the pixel that was clicked on! These could then be decoded by the webserver via a .map file or handled by a CGI program. Server-side image maps were sometimes used where client-side maps were undesirable, e.g. when you want to record the actual coordinates picked in a spot-the-ball competition or where you don’t want to reveal in advance which hotspot leads to what destination, but mostly they were just used as a fallback.9

Both client-side and server-side image maps still function in every modern web browser, but I’ve not seen them used in the wild for a long time, not least because they’re hard (maybe impossible?) to make accessible and they can’t cope with images being resized, but also because nowadays if you really wanted to make an navigation “image” you’d probably cut it into a series of smaller images and make each its own link.

Anyway, let’s come back in October 1997 and see if they’ve fixed their now-incomplete page:

Screenshot from Netscape Columns: Web Site Stories: the Coming Soon page is now laid out in two columns, but the expected launch date has been removed.

Oh, they have! From the look of things, they’ve re-written the page from scratch, replacing the version that got scrambled by that other employee. They’ve swapped out the banner and menu for a new design, replaced the footer, and now the content’s laid out in a pair of columns.

There’s still no reliable CSS, so you’re not looking at columns: (no implementations until 2014) nor at display: flex (2010) here. What you’re looking at is… a fixed-width <table> with a single row and three columns! Yes: three – the middle column is only 10 pixels wide and provides the “gap” between the two columns of text.10

This wasn’t Netscape’s only option, though. Did you ever hear of the <multicol> tag? It was the closest thing the early Web had to a semantically-sound, progressively-enhanced multi-column layout! The author of this page could have written this:

<multicol cols=2 gutter=10 width=301>
  <p>
    Want to create the best possible web site? Join us as we explore the newest
    technologies, discover the coolest tricks, and learn the best secrets for
    designing, building, and maintaining successful web sites.
  </p>
  <p>
    Members of the Netscape web site team, recognized designers, and technical
    experts will share their insights and experiences in Web Site Stories. 
  </p>
</multicol>

That would have given them the exact same effect, but with less code and it would have degraded gracefully. Browsers ignore tags they don’t understand, so a browser without support for <multicol> would have simply rendered the two paragraphs one after the other. Genius!

So why didn’t they? Probably because <multicol> only ever worked in Netscape Navigator.

Introduced in 1996 for version 3.0, this feature was absolutely characteristic of the First Browser War. The two “superpowers”, Netscape and Microsoft, both engaged in unilateral changes to the HTML specification, adding new features and launching them without announcement in order to try to get the upper hand over the other. Both sides would often refuse to implement one-another’s new tags unless they were forced to by widespread adoption by page authors, instead promoting their own competing mechanisms11.

Between adding this new language feature to their browser and writing this page, Netscape’s market share had fallen from around 80% to around 55%, and most of their losses were picked up by IE. Using <multicol> would have made their page look worse in Microsoft’s hot up-and-coming browser, which wouldn’t have helped them persuade more people to download a copy of Navigator and certainly wouldn’t be a good image on a soon-to-launch (any day now!) page about best-practice on the Web! So Netscape’s authors opted for the dominant, cross-platform solution on this page12.

Anyway, let’s fast-forward a bit and see this project finally leave its “under construction” phase and launch!

Screenshot showing the homepage of Netscape Columns from 15 February 1998; the first recorded copy NOT to have a header link to the Webstories / Web Site Stories page.

Oh. It’s gone.

Sometime between October 1997 and February 1998 the long promised “Web Site Stories” section of Netscape Columns quietly disappeared from the website. Presumably, it never published a single article, instead remaining a perpetual “Coming Soon” page right up until the day it was deleted.

I’m not sure if there’s a better metaphor for Netscape’s general demeanour in 1998 – the year in which they finally ceased to be the dominant market leader in web browsers – than the quiet deletion of a page about how Netscape customers are making the best of the Web. This page might not have been important, or significant, or even completed, but its disappearance may represent Netscape’s refocus on trying to stay relevant in the face of existential threat.

Of course, Microsoft won the First Browser War. They did so by pouring a fortune’s worth of developer effort into staying technologically one-step ahead, refusing to adopt standards proposed by their rival, and their unprecedented decision to give away their browser for free13.

Footnotes

1 Yes, we used to write “Web sites” as two words. We also used to consistently capitalise the words Web and Internet. Some of us still do so.

2 In case it’s not clear, this blog post is going to be as much about little-known and archaic Web design techniques as it is about Netscape’s website.

3 This is a white lie. CSS was first proposed almost at the same time as the Web! Microsoft Internet Explorer was first to deliver a partial implementation of the initial standard, late in 1996, but Netscape dragged their heels, perhaps in part because they’d originally backed a competing standard called JavaScript Style Sheets (JSSS). JSSS had a lot going for it: if it had enjoyed widespread adoption, for example, we’d have had the equivalent of CSS variables a full twenty years earlier! In any case, back in 1996 you definitely wouldn’t want to rely on CSS support.

4 Wondering where the text and link colours come from? <body bgcolor="#ffffff" text="#000000" link="#0000ff" vlink="#ff0000" alink="#ff0000">. Yes really, that’s where we used to put our colours.

5 Personally, I really loved the aesthetic Netscape touted when using Times New Roman (or whatever serif font was available on your computer: webfonts weren’t a thing yet) with temporary tweaks to font sizes, and I copied it in some of my own sites. If you look back at my 2018 blog post celebrating two decades of blogging, where I’ve got a screenshot of my blog as it looked circa 1999, you’ll see that I used exactly this technique for the ordinal suffixes on my post dates! On the same post, you’ll see that I somewhat replicated the “feel” of it again in my 2011 design, this time using a stylesheet.

6 There’s a whole section of Cameron’s World dedicated to “under construction” banners, and that’s a beautiful thing!

7 The idea of “garden and stream” is that you publish early and often, refining as you go, in your garden, which can act as an extension of whatever notetaking system you use already, but publish mostly “finished” content to your (chronological) stream. I see an increasing number of IndieWeb bloggers going down this route, but I’m not convinced that it’s for me.

8 Another white lie. PHP was released way back in 1995 and even the very first version supported something a lot like server-side includes, using the syntax <!--include /file/name.html-->. But it was a little computationally-intensive to run willy-nilly.

9 Server-side imagemaps are enjoying a bit of a renaissance on .onion services, whose visitors often keep JavaScript disabled, to make image-based CAPTCHAs. Simply show the visitor an image and describe the bit you want them to click on, e.g. “the blue pentagon with one side missing”, then compare the coordinates of the pixel they click on to the knowledge of the right answer. Highly-inaccessible, of course, but innovative from a purely-technical perspective.

10 Nowadays, use of tables for layout – or, indeed, for anything other than tabular data – is very-much frowned upon: it’s often bad for accessibility and responsive design. But back before we had the features granted to us by the modern Web, it was literally the only way to get content to appear side-by-side on a page, and designers got incredibly creative about how they misused tables to lay out content, especially as browsers became more-sophisticated and began to support cells that spanned multiple rows or columns, tables “nested” within one another, and background images.

11 It was a horrible time to be a web developer: having to make hacky workarounds in order to make use of the latest features but still support the widest array of browsers. But I’d still take that over the horrors of rendering engine monoculture!

12 Or maybe they didn’t even think about it and just copy-pasted from somewhere else on their site. I’m speculating.

13 This turned out to be the master-stroke: not only did it partially-extricate Microsoft from their agreement with Spyglass Inc., who licensed their browser engine to Microsoft in exchange for a percentage of sales value, but once Microsoft started bundling Internet Explorer with Windows it meant that virtually every computer came with their browser factory-installed! This strategy kept Microsoft on top until Firefox and Google Chrome kicked-off the Second Browser War in the early 2010s. But that’s another story.

× × × × × × × × × × ×

.well-known/feeds

<link rel="alternate" type="application/rss+xml"> is fine, but I feel like there should be a standard for a site, not a page, to share a “list of feeds associated with a site”.

Last night, I dreamed about a way to achieve that: ./well-known/feeds as an OPML document. Here’s mine, and here’s a draft spec.

Interested to hear what Dave Winer thinks…

How a 2002 standard made 2022 bearable

This is an alternate history of the Web. The premise is true, but the story diverges from our timeline and looks at an alternative “Web that might have been”.

Prehistory

This is the story of P3P, one of the greatest Web standards whose history has been forgotten1, and how the abject failure of its first versions paved the way for its bright future decades later. But I’m getting ahead of myself…

Drafted in 2002 in the wake of growing concern about the death of privacy on the Internet, P3P 1.0 aimed to make the collection of personally-identifiable data online transparent. Hurrah, right?

Not so much. Its immediate impact was lukewarm to negative: developers couldn’t understand why their cookies were no longer being accepted by Internet Explorer 6, the first browser to implement the standard, and the whole exercise was slated as providing a false sense of security, not stopping actual bad guys, and an attempt to apply a technical solution to a political problem.2

Flowchart showing the negotiation process between a user, browser, and server as the user browses an ecommerce site. The homepage's P3P policy states that it collects IP addresses, which is compatible with the user's preferences. Later, at checkout, the P3P policy states that the user's address will be collected and shared with a courier. The collection is fine according to the user's preferences, but she's asked to be notified if it'll be shared, so the browser notifies the user. The user approves of the policy and asks that this approval is remembered for this site, and the checkout process continues.
Initially, the principle was sound. The specification was weak. The implementation was apalling. But P3P 1.1 could have worked well.

Developers are lazy3 and soon converged on the simplest possible solution: add a garbage HTTP header like P3P: CP="See our website for our privacy policy." and your cookies work just fine! Ignore the problem, ignore the proposed solution, just do what gets the project shipped.

Without any meaningful enforcement it also perfectly feasible to, y’know, just lie about how well you treat user data. Seeing the way the wind was blowing, Mozilla dropped support for P3P, and Microsoft’s support – which had always been half-baked and lacked even the most basic user-facing controls or customisation options – languished in obscurity.

For a while, it seemed like P3P was dying. Maybe, in some alternate timeline, it did die: vanishing into nothing like VRML, WAP, and XBAP.

But fortunately for us, we don’t live in that timeline.

Revival

In 2009, the European Union revisited the Privacy and Electronic Communications Directive. The initial regulations, published in 2002, required that Web users be able to opt-out of tracking cookies, but the amendment required that sites ensure that users opted-in.

As-written, this confusing new regulation posed an immediate problem: if a user clicked the button to say “no, I don’t want cookies”, and you didn’t want to ask for their consent again on every page load… you had to give them a cookie (or use some other technique legally-indistinguishable from cookies). Now you’re stuck in an endless cookie-circle.4

This, and other factors of informed consent, quickly introduced a new pattern among those websites that were fastest to react to the legislative change:

Screenshot from how-i-experience-web-today.com showing an article mostly-covered by a cookie privacy statement and configuration options, utilising dark patterns to try to discourage users from opting-out of cookies.
The cookie consent banner, with all its confusing language and dark patterns, looked like it was going to become the new normal for web users in the early 2010s. But thankfully, our saviour had been waiting in the wings all along.

Web users rebelled. These ugly overlays felt like a regresssion to a time when popup ads and splash pages were commonplace. “If only,” people cried out, “There were a better way to do this!”

It was Professor Lorie Cranor, one of the original authors of the underloved P3P specification and a respected champion of usable privacy and security, whose rallying cry gave us hope. Her CNET article, “Why the EU Cookie Directive is a solved problem”5, inspired a new generation of development on what would become known as P3P 2.0.

While maintaining backwards compatibility, this new standard:

  • deprecated those horrible XML documents in favour of HTTP headers and <link> tags alone,
  • removing support for Set-Cookie2: headers, which nobody used anyway, and
  • added features by which the provenance and purpose of cookies could be stated in a way that dramatically simplified adoption in browsers

Internet Explorer at this point was still used by a majority of Web users. It still supported the older version of the standard, and – as perhaps the greatest gift that the much-maligned browser ever gave us – provided a reference implementation as well as a stepping-stone to wider adoption.

Opera, then Firefox, then “new kid” Chrome each adopted P3P 2.0; Microsoft finally got on board with IE 8 SP 1. Now the latest versions of all the mainstream browsers had a solid implementation6 well before the European data protection regulators began fining companies that misused tracking cookies.

Fabricated screenshot from Microsoft Edge, browsing 3r.org.uk: a "privacy" icon in the address bar has been clicked, and the resulting menu says: About 3r.org.uk. Connection is secure (with link for more info). Privacy and Cookies (with link for more info). Cookies (3 cookies in use) - Strictly necessary (2 in use), dropdown menu set to "Default (accept, delete later)"; Optional (1 in use), dropdown menu set to "Accept for this site". Checkbox for "Treat third-party cookies differently?", unchecked. Privacy (link to full policy): Legitimate interest - this site collects username, IP address, technical logs...; Consenmt - this site collects email address, phone number... Button to manage content. Button to "Exercise data rights".
Nowadays, we’ve pretty-well standardised on the address bar being the place where all cookie and privacy information and settings are stored. Can you imagine if things had gone any other way?

But where the story of P3P‘s successes shine brightest came in 2016, with the passing of the GDPR. The W3C realised that P3P could simplify both the expression and understanding of privacy policies for users, and formed a group to work on version 2.1. And that’s the version you use today.

When you launch a new service, you probably use one of the many free wizard-driven tools to express your privacy policy and the bases for your data processing, and it spits out a template privacy policy. You need the human-readable version, of course, since the 2020 German court ruling that you cannot rely on a machine-readable privacy policy alone, but the real gem is the P3P: 2.1 header version.

Assuming you don’t have any unusual quirks in your data processing (ask your lawyer!), you can just paste the relevant code into your server configuration and you’re good to go. Site users get a warning if their personal data preferences conflict with your data policies, and can choose how to act: not using your service, choosing which of your features to opt-in or out- of, or – hopefully! – granting an exception to your site (possibly with caveats, such as sandboxing your cookies or clearing them immediately after closing the browser tab).

Sure, what we’ve got isn’t perfect. Sometimes companies outright lie about their use of information or use illicit methods to track user behaviour. There’ll always be bad guys out there. That’s what laws are there to deal with.

But what we’ve got today is so seamless, it’s hard to imagine a world in which we somehow all… collectively decided that the correct solution to the privacy problem might have been to throw endless popovers into users’ faces, bury consent-based choices under dark patterns, and make humans do the work that should from the outset have been done by machines. What a strange and terrible timeline that would have been.

Footnotes

1 If you know P3P‘s history, regardless of what timeline you’re in: congratulations! You win One Internet Point.

2 Techbros have been trying to solve political problems using technology since long before the word “techbro” was used in its current context. See also: (a) there aren’t enough mental health professionals, let’s make an AI app? (b) we don’t have enough ventilators for this pandemic, let’s 3D print air pumps? (c) banks keep failing, let’s make a cryptocurrency? (d) we need less carbon in the atmosphere or we’re going to go extinct, better hope direct carbon capture tech pans out eh? (e) we have any problem at all, lets somehow shoehorn blockchain into some far-fetched idea about how to solve it without me having to get out of my chair why not?

3 Note to self: find a citation for this when you can be bothered.

4 I can’t decide whether “endless cookie circle” is the name of the New Wave band I want to form, or a description of the way I want to eventually die. Perhaps both.

5 Link missing. Did I jump timelines?

6 Implementation details varied, but that’s part of the joy of the Web. Firefox favoured “conservative” defaults; Chrome and IE had “permissive” ones; and Opera provided an ultra-configrable matrix of options by which a user could specify exactly which kinds of cookies to accept, linked to which kinds of personal data, from which sites, all somehow backed by an extended regular expression parser that was only truly understood by three people, two of whom were Opera developers.

× × ×

.well-known/links in WordPress

Via Jeremy Keith I today discovered Jim Nielsen‘s suggestion for a website’s /.well-known/links to be a place where it can host a JSON-formatted list of all of its outgoing links.

That’s a really useful thing to have in this new age of the web, where Refererer: headers are no-longer commonly passed cross-domain and Google Search no longer provides the link: operator. If you want to know if I’ve ever linked to your site, it’s a bit of a drag to find out.

JSON output from DanQ.me's .well-known/links, showing 1,150 outbound links to en.wikipedia.org domains, for topics like "Bulls and Cows", "Shebang (Unix)", "Imposter syndrome", "Interrobang", and "Blakedown".
To nobody’s surprise whatsoever, I’ve made a so many links to Wikipedia that I might be single-handedly responsible for their PageRank.

So, obviously, I’ve written an implementation for WordPress. It’s really basic right now, but the source code can be found here if you want it. Install it as a plugin and run wp outbound-links to kick it off. It’s fast: it takes 3-5 seconds to parse the entirety of danq.me, and I’ve got somewhere in the region of 5,000 posts to parse.

You can see the results at https://danq.me/.well-known/links – if you’ve ever wondered “has Dan ever linked to my site?”, now you can find the answer.

If this could be useful to you, let’s collaborate on making this into an actually-useful plugin! Otherwise it’ll just languish “as-is”, which is good enough for my purposes.

Ireland and the UK Aren’t In The Same Timezone!

This weekend, while investigating a bug in some code that generates iCalendar (ICS) feeds, I learned about a weird quirk in the Republic of Ireland’s timezone. It’s such a strange thing (and has so little impact on everyday life) that I imagine that even most Irish people don’t even know about it, but it’s important enough that it can easily introduce bugs into the way that computer calendars communicate:

Most of Europe put their clocks forward in Summer, but the Republic of Ireland instead put their clocks backward in Winter.

If that sounds to you like the same thing said two different ways – or the set-up to a joke! – read on:

Map showing timezones of Europe. The UK and Ireland are grouped (along with Iceland) in a zone labelled as being UTC+0.
The timezones of Europe look pretty simple compared to some parts of the world, but the illustration of the British Isles hides an interesting eccentricity.

A Brief History of Time (in Ireland)

Poster titled "Time (Ireland) Act 1916", advising that "On and after Sunday 1st October 1916 Western European Time will be ovserved throughout Ireland" asking people to set their clocks and watches back 35 minutes.
Spring forward, fall back… just a little bit back, though. Not too much.

After high-speed (rail) travel made mean solar timekeeping problematic, Great Britain in 1880 standardised on Greenwich Mean Time (UTC+0) as the time throughout the island, and Ireland standardised on Dublin Mean Time (UTC-00:25:21). If you took a ferry from Liverpool to Dublin towards the end of the 19th century you’d have to put your watch back by about 25 minutes. With air travel not yet being a thing, countries didn’t yet feel the need to fixate on nice round offsets in the region of one-hour (today, only a handful of regions retain UTC-offsets of half or quarter hours).

That’s all fine in peacetime, but by the First World War and especially following the Easter Rising, the British government decided that it was getting too tricky for their telegraph operators (many of whom operated out of Ireland, which provided an important junction for transatlantic traffic) to be on a different time to London.

1885 GPO telegraph instrument from the Porthcurno Telegraph Museum, which Dan almost visited the other week but it was closed.
It’s widely believed that the world’s first “U UP? [STOP]” message never got a response as a direct result of Anglo-Irish timezone confusion.
So the Time (Ireland) Act 1916 was passed, putting Ireland on Greenwich Mean Time. Ireland put her clocks back by 35 minutes and synched-up with the rest of the British Isles. And from then on, everything was simple and because nothing ever went wrong in Ireland as a result of the way it was governed by by Britain, nobody ever had to think about the question of timezones on the island again.

Ah. Hmm.

December 1920 photograph showing St Patrick's Street, Cork, following the burning of the city by British forces.
“Those Irish people want to govern their own country, do they? After we so kindly shared our king with them? Right-ho: let’s set fire to their cities and see how they feel then.”

Following Irish independence, the keeping of time carried on in much the same way for a long while, which will doubtless have been convenient for families spread across the Northern Irish border. But then came the Second World War.

Summers in the 1940s saw Churchill introduce Double Summer Time which he believed would give the UK more daylight, saving energy that might otherwise be used for lighting and increasing production of war materiel.

Ireland considered using the emergency powers they’d put in place to do the same, as a fuel saving measure… but ultimately didn’t. This was possibly because aligning her time with Britain might be seen as undermining her neutrality, but was more likely because the government saw that such a measure wouldn’t actually have much impact on fuel use (it certainly didn’t in Britain). Whatever the reason, though, Britain and Northern Ireland were again out-of-sync with one another until the war ended.

Newspaper clipping advising that "Double Summer Time comes to an end on Saturday night, August 8-9, when all clocks and watches should be put back one hour, thus reverting to British Summer Time, which will probably be maintained throughout the winter."
I like to imagine that the development of powerful computers by the folks at Bletchley Park was a result of needing to keep track of timezones across the British Isles.

From 1968 to 1971 Britain experimented with “British Standard Time” – putting the clocks forward in Summer once, to UTC+1, and then leaving them there for three years. This worked pretty well except if you were Scottish in which case you’ll have found winter mornings to be even gloomier than you were used to, which was already pretty gloomy. Conveniently: during much of this period Ireland was also on UTC+1, but in their case it was part of a different experiment. Ireland were working on joining the European Economic Community, and aligning themselves with “Paris time” year-round was an unnecessary concession but an interesting idea.

But here’s where the quirk appears: the Standard Time Act 1968, which made UTC+1 the “standard” timezone for the Republic of Ireland, was not repealed and is still in effect. Ireland could have started over in 1971 with a new rule that made UTC+0 the standard and added a “Summer Time” alternative during which the clocks are put forward… but instead the Standard Time (Amendment) Act 1971 left UTC+1 as Ireland’s standard timezone and added a “Winter Time” alternative during which the clocks are put back.

Two clocks, both showing the same time. One has a sign reading "LONDON", the other "DUBLIN, I GUESS?"
It all seems so simple until you actually think about it.

(For a deeper look at the legal history of time in the UK and Ireland, see this timeline. Certainly don’t get all your history lessons from me.)

So what?

You might rightly be thinking: so what! Having a standard time of UTC+0 and going forward for the Summer (like the UK), is functionally-equivalent to having a standard time of UTC+1 and going backwards in the Winter, like Ireland, right? It’s certainly true that, at any given moment, a clock in London and a clock in Dublin should show the same time. So why would anybody care?

Perl Data::ICal::TimeZone implementation of Dublin timezone, incorrectly showing summer DST at +1 rather than winter DST of -1.
This code for Europe/Dublin, from the Perl module Data::ICal::TimeZone, is technically-incorrect because it states that the winter time is the standard and daylight savings of +1 hour apply in the summer, rather than the opposite.

But declaring which is “standard” is important when you’re dealing with computers. If, for example, you run a volunteer rota management system that supports a helpline charity that has branches in both the UK and Ireland, then it might really matter that the computer systems involved know what each other mean when they talk about specific times.

The author of an iCalendar file can choose to embed timezone information to explain what, in that file, a particular timezone means. That timezone information might say, for example, “When I say ‘Europe/Dublin’, I mean UTC+1, or UTC+0 in the winter.” Or it might say – like the code above! – “When I say ‘Europe/Dublin’, I mean UTC+0, or UTC+1 in the summer.” Both of these declarations would be technically-valid and could be made to work, although only the first one would be strictly correct in accordance with the law.

Stressed programmer hunched over a MacBook. Photo by Anna Shvets from Pexels.
Clients who need solid timezone support represent 50% of a programmer’s production of stress hormones. See also Falsehoods Programmers Believe About Time.

But if you don’t include timezone information in your iCalendar file, you’re relying  on the feed subscriber’s computer (e.g. their calendar software) to make a sensible interpretation.. And that’s where you run into trouble. Because in cases like Ireland, for which the standard is one thing but is commonly-understood to be something different, there’s a real risk that the way your system interprets and encodes time won’t necessarily be the same as the way somebody else’s does.

If I say I’ll meet you at 12:00 on 1 January, in Ireland, you rightly need to know whether I’m talking about 12:00 in Irish “standard” time (i.e. 11:00, because daylight savings are in effect) or 12:00 in local-time-at-the-time-of-the-meeting (i.e. 12:00). Humans usually mean the latter because we think in terms of local time, but when your international computer system needs to make sure that people are on a shift at the same time, but in different timezones, it needs to be very clear what exactly it means!

And when your daylight savings works “backwards” compared to everybody else’s… that’s sure to make a developer somewhere cry. And, possibly, blog about your weird legislation.

× × × × × × × ×

CSS4 is here!

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

I think that CSS would be greatly helped if we solemnly state that “CSS4 is here!” In this post I’ll try to convince you of my viewpoint.

I am proposing that we web developers, supported by the W3C CSS WG, start saying “CSS4 is here!” and excitedly chatter about how it will hit the market any moment now and transform the practice of CSS.

Of course “CSS4” has no technical meaning whatsoever. All current CSS specifications have their own specific versions ranging from 1 to 4, but CSS as a whole does not have a version, and it doesn’t need one, either.

Regardless of what we say or do, CSS 4 will not hit the market and will not transform anything. It also does not describe any technical reality.

Then why do it? For the marketing effect.

Hurrah! CSS4 is here!

I’m sure that, like me, you’re excited to start using the latest CSS technologies, like paged media, hyphen control, the zero-specificity :where() selector, and new accessibility selectors like the ‘prefers-reduced-motion’ @media query. The browser support might not be “there” yet, but so long as you’ve got a suitable commitment to progressive enhancement then you can be using all of these and many more today (I am!). Fantastic!

But if you’ve got more than a little web savvy you might still be surprised to hear me say that CSS4 is here, or even that it’s a “thing” at all. Welll… that’s because it isn’t. Not officially. Just like JavaScript’s versioning has gone all evergreen these last few years, CSS has gone the same way, with different “modules” each making their way through the standards and implementation processes independently. Which is great, in general, by the way – we’re seeing faster development of long-overdue features now than we have through most of the Web’s history – but it does make it hard to keep track of what’s “current” unless you follow along watching very closely. Who’s got time for that?

When CSS2 gained prominence at around the turn of the millennium it was revolutionary, and part of the reason for that – aside from the fact that it gave us all some features we’d wanted for a long time – was that it gave us a term to rally behind. This browser already supports it, that browser’s getting there, this other browser supports it but has a f**ked-up box model (you all know the one I’m talking about)… we at last had an overarching term to discuss what was supported, what was new, what was ready for people to write articles and books about. Nobody’s going to buy a book that promises to teach them “CSS3 Selectors Level 3, Fonts Level 3, Writing Modes Level 3, and Containment Level 1”: that title’s not even going to fit on the cover. But if we wrapped up a snapshot of what’s current and called it CSS4… now that’s going to sell.

Can we show the CSS WG that there’s mileage in this idea and make it happen? Oh, I hope so. Because while the modular approach to CSS is beautiful and elegant and progressive… I’m afraid that we can’t use it to inspire junior developers.

Also: I don’t want this joke to forever remain among the top results when searching for CSS4

Sandwichware: Machines Talking to Machines About Humans

A recent observation by Phil Gyford reminded me of a recurring thought I’ve had. He wrote:

While being driven around England it struck me that humans are currently like the filling in a sandwich between one slice of machine — the satnav — and another — the car. Before the invention of sandwiches the vehicle was simply a slice of machine with a human topping. But now it’s a sandwich, and the two machine slices are slowly squeezing out the human filling and will eventually be stuck directly together with nothing but a thin layer of API butter. Then the human will be a superfluous thing, perhaps a little gherkin on the side of the plate.

While we were driving I was reading the directions from a mapping app on my phone, with the sound off, checking the upcoming turns, and giving verbal directions to Mary, the driver. I was an extra layer of human garnish — perhaps some chutney or a sliced tomato — between the satnav slice and the driver filling.

What Phil’s describing is probably familiar to you: the experience of one or more humans acting as the go-between to allow two machines to communicate. If you’ve ever re-typed a document that was visible on another screen, read somebody a password over the phone, given directions from a digital map, used a pendrive to carry files between computers that weren’t talking to one another properly then you’ve done it: you’ve been the soft wet meaty middleware that bridged two already semi-automated (but not quite automated enough) systems.

Galaxy Quest: Tawny Madison says "Gosh, I'm doing it. I'm repeating the damn computer."
Sigourney Weaver as Gwen DeMarco as Tawny Madison realised what she was doing back in 1999. Should I be alarmed that a science fiction spoof is a better indicator of the future that the science fiction it parodies?

This generally happens because of the lack of a common API (a communications protocol) between two systems. If your phone and your car could just talk it out then the car would know where to go all by itself! Or, until we get self-driving cars, it could at least provide the directions in a way that was appropriately-accessible to the driver: heads-up display, context-relative directions, or whatever.

It also sometimes happens when the computer-to-human interface isn’t good enough; for example I’ve often offered to navigate for a driver (and used my phone for the purpose) because I can add a layer of common sense. There’s no need for me to tell my buddy to take the second exit from every roundabout in Milton Keynes (did you know that the town has 930 of them?) – I can just tell them that I’ll let them know when they have to change road and trust that they’ll just keep going straight ahead until then.

Finally, we also sometimes find ourselves acting as a go-between to filter and improve information flow when the computers don’t have enough information to do better by themselves. I’ll use the fact that I can see the road conditions and the lane markings and the proposed route ahead to tell a driver to get into the right lane with an appropriate amount of warning. Or if the driver says “I can see signs to our destination now, I’ll just keep following them,” I can shut up unless something goes awry. Your in-car SatNav can’t do that because it can’t see and interpret the road ahead of you… at least not yet!

Oxbotica Driven self-driving car in Oxford.
I was certainly glad that this prototype self-driving car could “see” me when it overtook my bike the other day.

But here’s my thought: claims of an upcoming AI winter aside, it feels to me like we’re making faster progress in technologies related to human-computer interaction – voice and natural languages interfaces, popularised by virtual assistants like Siri and Alexa and by chatbots – than we are in technologies related to universal computer interoperability. Voice-controller computers are hip and exciting and attract a lot of investment but interoperable systems are hampered by two major things. The first thing holding back interoperability is business interests: for the longest while, for example, you couldn’t use Amazon Prime Video on a Google Chromecast for a long while because the two companies couldn’t play nice. The second thing is a lack of interest by manufacturers in developing open standards: every smart home appliance manufacturer wants you to use their app, and so your smart speaker manufacturer needs to implement code to talk to each and every one of them, and when they stop supporting one… well, suddenly your thermostat switches jumps permanently from smart mode to dumb mode.

A thing that annoys me is that from a technical perspective making an open standard should be a much easier task than making an AI that can understand what a human is asking for or drive a car safely or whatever we’re using them for this week. That’s not to say that technical standards aren’t difficult to get right – they absolutely are! – but we’ve been practising doing it for many, many decades! The very existence of the Internet over which you’ve been delivered this article is proof that computer interoperability is a solvable problem. For anybody who thinks that the interoperability brought about by the Internet was inevitable or didn’t take lots of hard work, I direct you to Darius Kazemi’s re-reading of the early standards discussions, which I first plugged a year ago; but the important thing is that people were working on it. That’s something we’re not really seeing in the Internet of Things space.

XKCD 927: Standards
Engineers: “Standards are good. Let’s have lots of them.”
Everybody else: “…?”

On our current trajectory, it’s absolutely possible that our virtual assistants will reach a point of becoming perfectly “human” communicators long before we can reach agreements about how they should communicate with one another. If that’s the case, those virtual assistants will probably fall back on using English-language voice communication as their lingua franca. In that case, it’s not unbelievable that ten to twenty years from now, the following series of events might occur:

  1. You want to go to your friends’ house, so you say out loud “Alexa, drive me to Bob’s house in five minutes.” Alexa responds “I’m on it; I’ll let you know more in a few minutes.”
  2. Alexa doesn’t know where Bob’s house is, but it knows it can get it from your netbook. It opens a voice channel over your wireless network (so you don’t have to “hear” it) and says “Hey Google, it’s Alexa [and here’s my credentials]; can you give me the address that [your name] means when they say ‘Bob’s house’?” And your netbook responds by reading out the address details, which Alexa then understands.
  3. Alexa doesn’t know where your self-driving car is right now and whether anybody’s using it, but it has a voice control system and a cellular network connection, so Alexa phones up your car and says: “Hey SmartCar, it’s Alexa [and here’s my credentials]; where are you and when were you last used?”. The car replies “I’m on the driveway, I’m fully-charged, and I was last used three hours ago by [your name].” So Alexa says “Okay, boot up, turn on climate control, and prepare to make a journey to [Bob’s address].” In this future world, most voice communication over telephones is done by robots: your virtual assistant calls your doctor’s virtual assistant to make you an appointment, and you and your doctor just get events in your calendars, for example, because nobody manages to come up with a universal API for medical appointments.
  4. Alexa responds “Okay, your SmartCar is ready to take you to Bob’s house.” And you have no idea about the conversations that your robots have been having behind your back

I’m not saying that this is a desirable state of affairs. I’m not even convinced that it’s likely. But it’s certainly possible if IoT development keeps focussing on shiny friendly conversational interfaces at the expense of practical, powerful technical standards. Our already topsy-turvy technologies might get weirder before they get saner.

But if English does become the “universal API” for robot-to-robot communication, despite all engineering common sense, I suggest that we call it “sandwichware”.

× ×

Evolving Computer Words: “GIF”

This is part of a series of posts on computer terminology whose popular meaning – determined by surveying my friends – has significantly diverged from its original/technical one. Read more evolving words…

The language we use is always changing, like how the word “cute” was originally a truncation of the word “acute”, which you’d use to describe somebody who was sharp-witted, as in “don’t get cute with me”. Nowadays, we use it when describing adorable things, like the subject of this GIF:

[Animated GIF] Puppy flumps onto a human.
Cute, but not acute.
But hang on a minute: that’s another word that’s changed meaning: GIF. Want to see how?

GIF

What people think it means

File format (or the files themselves) designed for animations and transparency. Or: any animation without sound.

What it originally meant

File format designed for efficient colour images. Animation was secondary; transparency was an afterthought.

The Past

Back in the 1980s cyberspace was in its infancy. Sir Tim hadn’t yet dreamed up the Web, and the Internet wasn’t something that most people could connect to, and bulletin board systems (BBSes) – dial-up services, often local or regional, sometimes connected to one another in one of a variety of ways – dominated the scene. Larger services like CompuServe acted a little like huge BBSes but with dial-up nodes in multiple countries, helping to bridge the international gaps and provide a lower learning curve than the smaller boards (albeit for a hefty monthly fee in addition to the costs of the calls). These services would later go on to double as, and eventually become exclusively, Internet Service Providers, but for the time being they were a force unto themselves.

CompuServe ad circa 1983
My favourite bit of this 1983 magazine ad for CompuServe is the fact that it claims a trademark on the word “email”. They didn’t try very hard to cling on to that claim, unlike their controversial patent on the GIF format…

In 1987, CompuServe were about to start rolling out colour graphics as a new feature, but needed a new graphics format to support that. Their engineer Steve Wilhite had the idea for a bitmap image format backed by LZW compression and called it GIF, for Graphics Interchange Format. Each image could be composed of multiple frames each having up to 256 distinct colours (hence the common mistaken belief that a GIF can only have 256 colours). The nature of the palette system and compression algorithm made GIF a particularly efficient format for (still) images with solid contiguous blocks of colour, like logos and diagrams, but generally underperformed against cosine-transfer-based algorithms like JPEG/JFIF for images with gradients (like most photos).

GIF with more than 256 colours.
This animated GIF (of course) shows how it’s possible to have more than 256 colours in a GIF by separating it into multiple non-temporal frames.

GIF would go on to become most famous for two things, neither of which it was capable of upon its initial release: binary transparency (having “see through” bits, which made it an excellent choice for use on Web pages with background images or non-static background colours; these would become popular in the mid-1990s) and animation. Animation involves adding a series of frames which overlay one another in sequence: extensions to the format in 1989 allowed the creator to specify the duration of each frame, making the feature useful (prior to this, they would be displayed as fast as they could be downloaded and interpreted!). In 1995, Netscape added a custom extension to GIF to allow them to loop (either a specified number of times or indefinitely) and this proved so popular that virtually all other software followed suit, but it’s worth noting that “looping” GIFs have never been part of the official standard!

Hex editor view of a GIF file's metadata section, showing Netscape headers.
Open almost any animated GIF file in a hex editor and you’ll see the word NETSCAPE2.0; evidence of Netscape’s role in making animated GIFs what they are today.

Compatibility was an issue. For a period during the mid-nineties it was quite possible that among the visitors to your website there would be a mixture of:

  1. people who wouldn’t see your GIFs at all, owing to browser, bandwidth, preference, or accessibility limitations,
  2. people who would only see the first frame of your animated GIFs, because their browser didn’t support animation,
  3. people who would see your animation play once, because their browser didn’t support looping, and
  4. people who would see your GIFs as you intended, fully looping

This made it hard to depend upon GIFs without carefully considering their use. But people still did, and they just stuck a Netscape Now button on to warn people, as if that made up for it. All of this has happened before, etc.

In any case: as better, newer standards like PNG came to dominate the Web’s need for lossless static (optionally transparent) image transmission, the only thing GIFs remained good for was animation. Standards like APNG/MNG failed to get off the ground, and so GIFs remained the dominant animated-image standard. As Internet connections became faster and faster in the 2000s, they experienced a resurgence in popularity. The Web didn’t yet have the <video> element and so embedding videos on pages required a mixture of at least two of <object>, <embed>, Flash, and black magic… but animated GIFs just worked and soon appeared everywhere.

Magic.
How animation online really works.

The Future

Nowadays, when people talk about GIFs, they often don’t actually mean GIFs! If you see a GIF on Giphy or WhatsApp, you’re probably actually seeing an MPEG-4 video file with no audio track! Now that Web video is widely-supported, service providers know that they can save on bandwidth by delivering you actual videos even when you expect a GIF. More than ever before, GIF has become a byword for short, often-looping Internet animations without sound… even though that’s got little to do with the underlying file format that the name implies.

What's a web page? Something ducks walk on?
What’s a web page? What’s anything?

Verdict: We still can’t agree on whether to pronounce it with a soft-G (“jif”), as Wilhite intended, or with a hard-G, as any sane person would, but it seems that GIFs are here to stay in name even if not in form. And that’s okay. I guess.

× × × × × ×

Proposal to allow specifying a text snippet in a URL fragment

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

To enable users to easily navigate to specific content in a web page, we propose adding support for specifying a text snippet in the URL. When navigating to such a URL, the browser will find the first instance of the text snippet in the page and bring it into view.

Web standards currently specify support for scrolling to anchor elements with name attributes, as well as DOM elements with ids, when navigating to a fragment. While named anchors and elements with ids enable scrolling to limited specific parts of web pages, not all documents make use of these elements, and not all parts of pages are addressable by named anchors or elements with ids.

Current Status

This feature is currently implemented as an experimental feature in Chrome 74.0.3706.0 and newer. It is not yet shipped to users by default. Users who wish to experiment with it can use chrome://flags#enable-text-fragment-anchor. The implementation is incomplete and doesn’t necessarily match the specification in this document.

tl;dr

Allow specifying text to scroll and highlight in the URL:

https://example.com##targetText=prefix-,startText,endText,-suffix

Using this syntax

##targetText=[prefix-,]textStart[,textEnd][,-suffix]

              context  |-------match-----|  context

(Square brackets indicate an optional parameter)

This is a feature that I’ve wished that the Web had on many, many occasions. I’m sure you’ve needed it before, too: you’ve wanted to give somebody the URL of (or link to) a particular part of a page but there’s been no appropriately-placed anchor to latch on to. Being able to select part of the text on the page and just copy that after a ## in the address bar would be so much simpler.

Chrome's experimental fragment text link targetting
Naturally, I tried this experimental feature out on this very web page; it worked pretty nicely!

Chrome’s implementation is somewhat conservative, requiring a prefix of ##targetText= (this minimises the risk of collision with other applications which store/pass data via hashes), but it’s still pretty full-featured, with support for prefixes and suffixes to the text to-be-selected. I quite like it, but of course it needs running down the standards track before it can be relied upon as anything other than a progressive enhancement.

I do wonder, though, whether this will be met with resistance by ad/subscription-supported content creators as a new example of the deep linking they seem to hate so much.

(with thanks to Jeremy Keith for sharing this)

×

Why do book spines have the titles printed the way they do?

Have you noticed how the titles printed on the spines of your books are all, for the most part, oriented the same way? That’s not a coincidence.

Illustration of a chained library
If you can’t see the spines of your books at all, perhaps you’re in a library and it’s the 17th century. Wait: how are you on the Internet?

ISO 6357 defines the standard positioning of titles on the spines of printed books (it’s also codified as British Standard BS6738). If you assume that your book is stood “upright”, the question is one of which way you tilt your head to read the title printed on the spine. If you tilt your head to the right, that’s a descending title (as you read left-to-right, your gaze moves down, towards the surface on which the book stands). If you tilt your head to the left, that’s an ascending title. If you don’t need to tilt your head in either direction, that’s a transverse title.

ISO 6357:1985 page illustrating different standard spine title alignments.
Not every page in ISO 6357:1985 is as exciting as this one.

The standard goes on to dictate that descending titles are to be favoured: this places the title in a readable orientation when the book lays flat on a surface with the cover face-up. Grab the nearest book to you right now and you’ll probably find that it has a descending title.

Books on a shelf.
This eclectic shelf includes a transverse title (the Holy Bible), a semi-transverse title (The Book of English Magic) and a rare ascending title (A First Dictionary) among a multitude of descending titles.

But if the book is lying on a surface, I can usually read the cover of the book. Only if a book is in a stack am I unable to do so, and stacks are usually relatively short and so it’s easy enough to lift one or more books from the top to see what lies beneath. What really matters when considering the orientation of a spine title is, in my mind, how it appears when it’s shelved.

It feels to me like this standard’s got things backwards. If a shelf of anglophone books is organised into any kind of order (e.g. alphabetically) then it’ll usually be from left to right. If I’m reading the titles from left to right, and the spines are printed descending, then – from the perspective of my eyes – I’m reading from bottom to top: i.e. backwards!

It’s possible that this is one of those things that I overthink.

× ×

RFC-20

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

The choice of this encoding has made ASCII-compatible standards the language that computers use to communicate to this day.

Even casual internet users have probably encountered a URL with “%20” in it where there logically ought to be a space character. If we look at this RFC we see this:

   Column/Row  Symbol      Name

   2/0         SP          Space (Normally Non-Printing)

Hey would you look at that! Column 2, row 0 (2,0; 20!) is what stands for “space”. When you see that “%20”, it’s because of this RFC, which exists because of some bureaucratic decisions made in the 1950s and 1960s.

Darius Kazemi is reading a single RFC every day throughout 2019 and writing up his understanding as to the content and importance of each. It’s good reading if you’re “into” RFCs and it’s probably pretty interesting if you’re just a casual Internet historian.