Evolving Computer Words: “GIF”

This is part of a series of posts on computer terminology whose popular meaning – determined by surveying my friends – has significantly diverged from its original/technical one. Read more evolving words…

The language we use is always changing, like how the word “cute” was originally a truncation of the word “acute”, which you’d use to describe somebody who was sharp-witted, as in “don’t get cute with me”. Nowadays, we use it when describing adorable things, like the subject of this GIF:

[Animated GIF] Puppy flumps onto a human.
Cute, but not acute.
But hang on a minute: that’s another word that’s changed meaning: GIF. Want to see how?

GIF

What people think it means

File format (or the files themselves) designed for animations and transparency. Or: any animation without sound.

What it originally meant

File format designed for efficient colour images. Animation was secondary; transparency was an afterthought.

The Past

Back in the 1980s cyberspace was in its infancy. Sir Tim hadn’t yet dreamed up the Web, and the Internet wasn’t something that most people could connect to, and bulletin board systems (BBSes) – dial-up services, often local or regional, sometimes connected to one another in one of a variety of ways – dominated the scene. Larger services like CompuServe acted a little like huge BBSes but with dial-up nodes in multiple countries, helping to bridge the international gaps and provide a lower learning curve than the smaller boards (albeit for a hefty monthly fee in addition to the costs of the calls). These services would later go on to double as, and eventually become exclusively, Internet Service Providers, but for the time being they were a force unto themselves.

CompuServe ad circa 1983
My favourite bit of this 1983 magazine ad for CompuServe is the fact that it claims a trademark on the word “email”. They didn’t try very hard to cling on to that claim, unlike their controversial patent on the GIF format…

In 1987, CompuServe were about to start rolling out colour graphics as a new feature, but needed a new graphics format to support that. Their engineer Steve Wilhite had the idea for a bitmap image format backed by LZW compression and called it GIF, for Graphics Interchange Format. Each image could be composed of multiple frames each having up to 256 distinct colours (hence the common mistaken belief that a GIF can only have 256 colours). The nature of the palette system and compression algorithm made GIF a particularly efficient format for (still) images with solid contiguous blocks of colour, like logos and diagrams, but generally underperformed against cosine-transfer-based algorithms like JPEG/JFIF for images with gradients (like most photos).

GIF with more than 256 colours.
This animated GIF (of course) shows how it’s possible to have more than 256 colours in a GIF by separating it into multiple non-temporal frames.

GIF would go on to become most famous for two things, neither of which it was capable of upon its initial release: binary transparency (having “see through” bits, which made it an excellent choice for use on Web pages with background images or non-static background colours; these would become popular in the mid-1990s) and animation. Animation involves adding a series of frames which overlay one another in sequence: extensions to the format in 1989 allowed the creator to specify the duration of each frame, making the feature useful (prior to this, they would be displayed as fast as they could be downloaded and interpreted!). In 1995, Netscape added a custom extension to GIF to allow them to loop (either a specified number of times or indefinitely) and this proved so popular that virtually all other software followed suit, but it’s worth noting that “looping” GIFs have never been part of the official standard!

Hex editor view of a GIF file's metadata section, showing Netscape headers.
Open almost any animated GIF file in a hex editor and you’ll see the word NETSCAPE2.0; evidence of Netscape’s role in making animated GIFs what they are today.

Compatibility was an issue. For a period during the mid-nineties it was quite possible that among the visitors to your website there would be a mixture of:

  1. people who wouldn’t see your GIFs at all, owing to browser, bandwidth, preference, or accessibility limitations,
  2. people who would only see the first frame of your animated GIFs, because their browser didn’t support animation,
  3. people who would see your animation play once, because their browser didn’t support looping, and
  4. people who would see your GIFs as you intended, fully looping

This made it hard to depend upon GIFs without carefully considering their use. But people still did, and they just stuck a Netscape Now button on to warn people, as if that made up for it. All of this has happened before, etc.

In any case: as better, newer standards like PNG came to dominate the Web’s need for lossless static (optionally transparent) image transmission, the only thing GIFs remained good for was animation. Standards like APNG/MNG failed to get off the ground, and so GIFs remained the dominant animated-image standard. As Internet connections became faster and faster in the 2000s, they experienced a resurgence in popularity. The Web didn’t yet have the <video> element and so embedding videos on pages required a mixture of at least two of <object>, <embed>, Flash, and black magic… but animated GIFs just worked and soon appeared everywhere.

Magic.
How animation online really works.

The Future

Nowadays, when people talk about GIFs, they often don’t actually mean GIFs! If you see a GIF on Giphy or WhatsApp, you’re probably actually seeing an MPEG-4 video file with no audio track! Now that Web video is widely-supported, service providers know that they can save on bandwidth by delivering you actual videos even when you expect a GIF. More than ever before, GIF has become a byword for short, often-looping Internet animations without sound… even though that’s got little to do with the underlying file format that the name implies.

What's a web page? Something ducks walk on?
What’s a web page? What’s anything?

Verdict: We still can’t agree on whether to pronounce it with a soft-G (“jif”), as Wilhite intended, or with a hard-G, as any sane person would, but it seems that GIFs are here to stay in name even if not in form. And that’s okay. I guess.

[Animated GIF] Puppy flumps onto a human.× CompuServe ad circa 1983× GIF with more than 256 colours.× Hex editor view of a GIF file's metadata section, showing Netscape headers.× Magic.× What's a web page? Something ducks walk on?×

Evolving Computer Words: “Broadband”

This is part of a series of posts on computer terminology whose popular meaning – determined by surveying my friends – has significantly diverged from its original/technical one. Read more evolving words…

Until the 17th century, to “fathom” something was to embrace it. Nowadays, it’s more likely to refer to your understanding of something in depth. The migration came via the similarly-named imperial unit of measurement, which was originally defined as the span of a man’s outstretched arms, so you can understand how we got from one to the other. But you know what I can’t fathom? Broadband.

Woman hugging a dalmatian. Photo by Daria Shevtsova from Pexels.
I can’t fathom dalmatians. But this woman can.

Broadband Internet access has become almost ubiquitous over the last decade and a half, but ask people to define “broadband” and they have a very specific idea about what it means. It’s not the technical definition, and this re-invention of the word can cause problems.

Broadband

What people think it means

High-speed, always-on Internet access.

What it originally meant

Communications channel capable of multiple different traffic types simultaneously.

The Past

Throughout the 19th century, optical (semaphore) telegraph networks gave way to the new-fangled electrical telegraph, which not only worked regardless of the weather but resulted in significantly faster transmission. “Faster” here means two distinct things: latency – how long it takes a message to reach its destination, and bandwidth – how much information can be transmitted at once. If you’re having difficulty understanding the difference, consider this: a man on a horse might be faster than a telegraph if the size of the message is big enough because a backpack full of scrolls has greater bandwidth than a Morse code pedal, but the latency of an electrical wire beats land transport every time. Or as Andrew S. Tanenbaum famously put it: Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

Telephone-to-heliograph conversion circa 1912.
There were transitional periods. This man, photographed in 1912, is relaying a telephone message into a heliograph. I’m not sure what message he’s transmitting, but I’m guessing it ends with a hashtag.

Telegraph companies were keen to be able to increase their bandwidth – that is, to get more messages on the wire – and this was achieved by multiplexing. The simplest approach, time-division multiplexing, involves messages (or parts of messages) “taking turns”, and doesn’t actually increase bandwidth at all: although it does improve the perception of speed by giving recipients the start of their messages early on. A variety of other multiplexing techniques were (and continue to be) explored, but the one that’s most-interesting to us right now was called acoustic telegraphy: today, we’d call it frequency-division multiplexing.

What if, asked folks-you’ll-have-heard-of like Thomas Edison and Alexander Graham Bell, we were to send telegraph messages down the line at different frequencies. Some beeps and bips would be high tones, and some would be low tones, and a machine at the receiving end could separate them out again (so long as you chose your frequencies carefully, to avoid harmonic distortion). As might be clear from the names I dropped earlier, this approach – sending sound down a telegraph wire – ultimately led to the invention of the telephone. Hurrah, I’m sure they all immediately called one another to say, our efforts to create a higher-bandwidth medium for telegrams has accidentally resulted in a lower-bandwidth (but more-convenient!) way for people to communicate. Job’s a good ‘un.

Electro-acoustic telegraph "tuning fork".
If this part of Edison’s 1878 patent looks like a tuning fork, that’s not a coincidence. These early multiplexers made distinct humming sounds as they operated, owing to the movement of the synchronised forks within.

Most electronic communications systems that have ever existed have been narrowband: they’ve been capable of only a single kind of transmission at a time. Even if you’re multiplexing a dozen different frequencies to carry a dozen different telegraph messages at once, you’re still only transmitting telegraph messages. For the most part, that’s fine: we’re pretty clever and we can find workarounds when we need them. For example, when we started wanting to be able to send data to one another (because computers are cool now) over telephone wires (which are conveniently everywhere), we did so by teaching our computers to make sounds and understand one another’s sounds. If you’re old enough to have heard a fax machine call a landline or, better yet used a dial-up modem, you know what I’m talking about.

As the Internet became more and more critical to business and home life, and the limitations (of bandwidth and convenience) of dial-up access became increasingly questionable, a better solution was needed. Bringing broadband to Internet access was necessary, but the technologies involved weren’t revolutionary: they were just the result of the application of a little imagination.

Dawson can't use the Internet because someone's on the phone.
I’ve felt your pain, Dawson. I’ve felt your pain.

We’d seen this kind of imagination before. Consider teletext, for example (for those of you too young to remember teletext, it was a standard for browsing pages of text and simple graphics using an 70s-90s analogue television), which is – strictly speaking – a broadband technology. Teletext works by embedding pages of digital data, encoded in an analogue stream, in the otherwise-“wasted” space in-between frames of broadcast video. When you told your television to show you a particular page, either by entering its three-digit number or by following one of four colour-coded hyperlinks, your television would wait until the page you were looking for came around again in the broadcast stream, decode it, and show it to you.

Teletext was, fundamentally, broadband. In addition to carrying television pictures and audio, the same radio wave was being used to transmit text: not pictures of text, but encoded characters. Analogue subtitles (which used basically the same technology): also broadband. Broadband doesn’t have to mean “Internet access”, and indeed for much of its history, it hasn’t.

Ceefax news article from 29 October 1988, about a cancelled Soviet shuttle launch.
My family started getting our news via broadband in about 1985. Not broadband Internet, but broadband nonetheless.

Here in the UK, ISDN (from 1988!) and later ADSL would be the first widespread technologies to provide broadband data connections over the copper wires simultaneously used to carry telephone calls. ADSL does this in basically the same way as Edison and Bell’s acoustic telegraphy: a portion of the available frequencies (usually the first 4MHz) is reserved for telephone calls, followed by a no-mans-land band, followed by two frequency bands of different sizes (hence the asymmetry: the A in ADSL) for up- and downstream data. This, at last, allowed true “broadband Internet”.

But was it fast? Well, relative to dial-up, certainly… but the essential nature of broadband technologies is that they share the bandwidth with other services. A connection that doesn’t have to share will always have more bandwidth, all other things being equal! Leased lines, despite technically being a narrowband technology, necessarily outperform broadband connections having the same total bandwidth because they don’t have to share it with other services. And don’t forget that not all speed is created equal: satellite Internet access is a narrowband technology with excellent bandwidth… but sometimes-problematic latency issues!

ADSL microfilter
Did you have one of these tucked behind your naughties router? This box filtered out the data from the telephone frequencies, helping to ensure that you can neither hear the pops and clicks of your ADSL connection nor interfere with it by shouting.

Equating the word “broadband” with speed is based on a consumer-centric misunderstanding about what broadband is, because it’s necessarily true that if your home “broadband” weren’t configured to be able to support old-fashioned telephone calls, it’d be (a) (slightly) faster, and (b) not-broadband.

The Future

But does the word that people use to refer to their high-speed Internet connection matter. More than you’d think: various countries around the world have begun to make legal definitions of the word “broadband” based not on the technical meaning but on the populist one, and it’s becoming a source of friction. In the USA, the FCC variously defines broadband as having a minimum download speed of 10Mbps or 25Mbps, among other characteristics (they seem to use the former when protecting consumer rights and the latter when reporting on penetration, and you can read into that what you will). In the UK, Ofcom‘s regulations differentiate between “decent” (yes, that’s really the word they use) and “superfast” broadband at 10Mbps and 24Mbps download speeds, respectively, while the Scottish and Welsh governments as well as the EU say it must be 30Mbps to be “superfast broadband”.

Faster, Harder, Scooter music video.
At full-tilt, going from 10Mbps to 24Mbps means taking only 4 seconds, rather than 11 seconds, to download the music video to Faster! Harder! Scooter!

I’m all in favour of regulation that protects consumers and makes it easier for them to compare products. It’s a little messy that definitions vary so widely on what different speeds mean, but that’s not the biggest problem. I don’t even mind that these agencies have all given themselves very little breathing room for the future: where do you go after “superfast”? Ultrafast (actually, that’s exactly where we go)? Megafast? Ludicrous speed?

What I mind is the redefining of a useful term to differentiate whether a connection is shared with other services or not to be tied to a completely independent characteristic of that connection. It’d have been simple for the FCC, for example, to have defined e.g. “full-speed broadband” as providing a particular bandwidth.

Verdict: It’s not a big deal; I should just chill out. I’m probably going to have to throw in the towel anyway on this one and join the masses in calling all high-speed Internet connections “broadband” and not using that word for all slower and non-Internet connections, regardless of how they’re set up.

Woman hugging a dalmatian. Photo by Daria Shevtsova from Pexels.× Telephone-to-heliograph conversion circa 1912.× Dawson can't use the Internet because someone's on the phone.× Ceefax news article from 29 October 1988, about a cancelled Soviet shuttle launch.× Faster, Harder, Scooter music video.×

Evolving Computer Words: “Virus”

This is part of a series of posts on computer terminology whose popular meaning – determined by surveying my friends – has significantly diverged from its original/technical one. Read more evolving words…

A few hundred years ago, the words “awesome” and “awful” were synonyms. From their roots, you can see why: they mean “tending to or causing awe” and “full or or characterised by awe”, respectively. Nowadays, though, they’re opposites, and it’s pretty awesome to see how our language continues to evolve. You know what’s awful, though? Computer viruses. Right?

Man staring intently at laptop. Image courtesy Oladimeji Ajegbile, via Pexels.
“Oh no! A virus has stolen all my selfies and uploaded them to a stock photos site!”

You know what I mean by a virus, right? A malicious computer program bent on causing destruction, spying on your online activity, encrypting your files and ransoming them back to you, showing you unwanted ads, etc… but hang on: that’s not right at all…

Virus

What people think it means

Malicious or unwanted computer software designed to cause trouble/commit crimes.

What it originally meant

Computer software that hides its code inside programs and, when they’re run, copies itself into other programs.

The Past

Only a hundred and thirty years ago it was still widely believed that “bad air” was the principal cause of disease. The idea that tiny germs could be the cause of infection was only just beginning to take hold. It was in this environment that the excellent scientist Ernest Hankin travelled around India studying outbreaks of disease and promoting germ theory by demonstrating that boiling water prevented cholera by killing the (newly-discovered) vibrio cholerae bacterium. But his most-important discovery was that water from a certain part of the Ganges seemed to be naturally inviable as a home for vibrio cholerae… and that boiling this water removed this superpower, allowing the special water to begin to once again culture the bacterium.

Hankin correctly theorised that there was something in that water that preyed upon vibrio cholerae; something too small to see with a microscope. In doing so, he was probably the first person to identify what we now call a bacteriophage: the most common kind of virus. Bacteriophages were briefly seen as exciting for their medical potential. But then in the 1940s antibiotics, which were seen as far more-convenient, began to be manufactured in bulk, and we stopped seriously looking at “phage therapy” (interestingly, phages are seeing a bit of a resurgence as antibiotic resistance becomes increasingly problematic).

Electron microscope image of a bacteriophage alongside an illustration of the same.
It took until the development of the scanning electron microscope in the mid-20th century before we’d actually “see” a virus.

But the important discovery kicked-off by the early observations of Hankin and others was that viruses exist. Later, researchers would discover how these viruses work1: they inject their genetic material into cells, and this injected “code” supplants the unfortunate cell’s usual processes. The cell is “reprogrammed” – sometimes after a dormant period – to churns out more of the virus, becoming a “virus factory”.

Let’s switch to computer science. Legendary mathematician John von Neumann, fresh from showing off his expertise in calculating how shaped charges should be used to build the first atomic bombs, invented the new field of cellular autonoma. Cellular autonoma are computationally-logical, independent entities that exhibit complex behaviour through their interactions, but if you’ve come across them before now it’s probably because you played Conway’s Game of Life, which made the concept popular decades after their invention. Von Neumann was very interested in how ideas from biology could be applied to computer science, and is credited with being the first person to come up with the idea of a self-replicating computer program which would write-out its own instructions to other parts of memory to be executed later: the concept of the first computer virus.

Glider factory breeder in Conway's Game of Life
This is a glider factory… factory. I remember the first time I saw this pattern, in the 1980s, and it sank in for me that cellular autonoma must logically be capable of any arbitrary level of complexity. I never built a factory-factory-factory, but I’ll bet that others have.

Retroactively-written lists of early computer viruses often identify 1971’s Creeper as the first computer virus: it was a program which, when run, moved (later copied) itself to another computer on the network and showed the message “I’m the creeper: catch me if you can”. It was swiftly followed by a similar program, Reaper, which replicated in a similar way but instead of displaying a message attempted to delete any copies of Creeper that it found. However, Creeper and Reaper weren’t described as viruses at the time and would be more-accurately termed worms nowadays: self-replicating network programs that don’t inject their code into other programs. An interesting thing to note about them, though, is that – contrary to popular conception of a “virus” – neither intended to cause any harm: Creeper‘s entire payload was a relatively-harmless message, and Reaper actually tried to do good by removing presumed-unwanted software.

Another early example that appears in so-called “virus timelines” came in 1975. ANIMAL presented as a twenty questions-style guessing game. But while the user played it would try to copy itself into another user’s directory, spreading itself (we didn’t really do directory permissions back then). Again, this wasn’t really a “virus” but would be better termed a trojan: a program which pretends to be something that it’s not.

Replica Trojan horse.
“Malware? Me? No siree… nothing here but this big executable horse.”

It took until 1983 before Fred Cooper gave us a modern definition of a computer virus, one which – ignoring usage by laypeople – stands to this day:

A program which can ‘infect’ other programs by modifying them to include a possibly evolved copy of itself… every program that gets infected may also act as a virus and thus the infection grows.

This definition helps distinguish between merely self-replicating programs like those seen before and a new, theoretical class of programs that would modify host programs such that – typically in addition to the host programs’ normal behaviour – further programs would be similarly modified. Not content with leaving this as a theoretical, Cooper wrote the first “true” computer virus to demonstrate his work (it was never released into the wild): he also managed to prove that there can be no such thing as perfect virus detection.

(Quick side-note: I’m sure we’re all on the same page about the evolution of language here, but for the love of god don’t say viri. Certainly don’t say virii. The correct plural is clearly viruses. The Latin root virus is a mass noun and so has no plural, unlike e.g. fungus/fungi, and so its adoption into a count-noun in English represents the creation of a new word which should therefore, without a precedent to the contrary, favour English pluralisation rules. A parallel would be bonus, which shares virus‘s linguistic path, word ending, and countability-in-Latin: you wouldn’t say “there were end-of-year boni for everybody in my department”, would you? No. So don’t say viri either.)

(Inaccurate) slide describing viruses as programs that damage computers or files.
No, no, no, no, no. The only wholly-accurate part of this definition is the word “program”.

Viruses came into their own as computers became standardised and commonplace and as communication between them (either by removable media or network/dial-up connections) and Cooper’s theoretical concepts became very much real. In 1986, The Virdim method brought infectious viruses to the DOS platform, opening up virus writers’ access to much of the rapidly growing business and home computer markets.

The Virdim method has two parts: (a) appending the viral code to the end of the program to be infected, and (b) injecting early into the program a call to the appended code. This exploits the typical layout of most DOS executable files and ensures that the viral code is run first, as an infected program loads, and the virus can spread rapidly through a system. The appearance of this method at a time when hard drives were uncommon and so many programs would be run from floppy disks (which could be easily passed around between users) enabled this kind of virus to spread rapidly.

For the most part, early viruses were not malicious. They usually only caused harm as a side-effect (as we’ve already seen, some – like Reaper – were intended to be not just benign but benevolent). For example, programs might run slower if they’re also busy adding viral code to other programs, or a badly-implemented virus might even cause software to crash. But it didn’t take long before viruses started to be used for malicious purposes – pranks, adware, spyware, data ransom, etc. – as well as to carry political messages or to conduct cyberwarfare.

XKCD 1180: Virus Venn Diagram
XKCD already explained all of this in far fewer words and a diagram.

The Future

Nowadays, though, viruses are becoming less-common. Wait, what?

Yup, you heard me right: new viruses aren’t being produced at remotely the same kind of rate as they were even in the 1990s. And it’s not that they’re easier for security software to catch and quarantine; if anything, they’re less-detectable as more and more different types of file are nominally “executable” on a typical computer, and widespread access to powerful cryptography has made it easier than ever for a virus to hide itself in the increasingly-sprawling binaries that litter modern computers.

"Security" button
Soo… I click this and all the viruses go away, right? Why didn’t we do this sooner?

The single biggest reason that virus writing is on the decline is, in my opinion, that writing something as complex as a a virus is longer a necessary step to illicitly getting your program onto other people’s computers2! Nowadays, it’s far easier to write a trojan (e.g. a fake Flash update, dodgy spam attachment, browser toolbar, or a viral free game) and trick people into running it… or else to write a worm that exploits some weakness in an open network interface. Or, in a recent twist, to just add your code to a popular library and let overworked software engineers include it in their projects for you. Modern operating systems make it easy to have your malware run every time they boot and it’ll quickly get lost amongst the noise of all the other (hopefully-legitimate) programs running alongside it.

In short: there’s simply no need to have your code hide itself inside somebody else’s compiled program any more. Users will run your software anyway, and you often don’t even have to work very hard to trick them into doing so.

Verdict: Let’s promote use of the word “malware” instead of “virus” for popular use. It’s more technically-accurate in the vast majority of cases, and it’s actually a more-useful term too.

Footnotes

1 Actually, not all viruses work this way. (Biological) viruses are, it turns out, really really complicated and we’re only just beginning to understand them. Computer viruses, though, we’ve got a solid understanding of.

2 There are other reasons, such as the increase in use of cryptographically-signed binaries, protected memory space/”execute bits”, and so on, but the trend away from traditional viruses and towards trojans for delivery of malicious payloads began long before these features became commonplace.

Man staring intently at laptop. Image courtesy Oladimeji Ajegbile, via Pexels.× Electron microscope image of a bacteriophage alongside an illustration of the same.× Glider factory breeder in Conway's Game of Life× Replica Trojan horse.× (Inaccurate) slide describing viruses as programs that damage computers or files.× "Security" button×

Note #15043

My hardware engineering is a little rusty, @ComputerHistory, but wouldn’t signals propogate along this copper cable at “only” somewhere between 0.64c and 0.95c, not 1c as you claim?

Exhibit of early transatlantic telegraph cable with message implying that it enabled "speed of light" communications.

Exhibit of early transatlantic telegraph cable with message implying that it enabled "speed of light" communications.×

Note #14082

A hundred zucks

In the remote chance that @Facebook‘s #LibraCoin [Wikipedia] takes off, I suggest that the appropriate slang term for the currency shall be zucks.

As in: “I’ll bet you a hundred zucks that this new #cryptocurrency will be barely more-successful than Dogecoin, and far less-cute.”

remysharp comments on “Bringing back the Web of 1990”

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Hi @avapoet, I’m the author of the JavaScript for the WorldWideWeb project, and I did read your thread on the user-agent missing and I thought I’d land the fix ;-)

The original WorldWideWeb browser that we based our work on was 0.12 with screenshots from 0.16. Both browsers supported HTTP 0.9 which didn’t send headers. Obviously unintentional that I send the `request` user-agent, so I spent some painful hours trying to get my emulator running NeXT with a networked connection _and_ the WorldWideWeb version 1.0 – which _did_ use HTTP 1.0 and would send a User-Agent, so I could copy it accurately into the emulator code base.

So now metafilter.com renders in the emulator, and the User Agent sent is: CERN-NextStep-WorldWideWeb.app/1.1 libwww/2.07

Thanks again :)

I blogged about the reimplementation of WorldWideWeb by a hackathon team at CERN, and posted a commentary to MetaFilter, too. In doing so, some others observed that it wasn’t capable of showing MetaFilter pages, which was obviously going to be the first thing that anybody did with it and I ought to have checked first. In any case, I later checked out the source code and did some debugging, finding and proposing a fix. It feels cool to be able to say “I improved upon some code written at CERN,” even if it’s only by a technicality.

This comment on the MetaFilter thread, which I only just noticed, is by Remy Sharp, who was part of the team that reimplemented WorldWideWeb as part of that hackathon (his blog posts about the experience: 1, 2, 3, 4, 5), and acknowledges my contribution. Squee!

Enigma, the Bombe, and Typex

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

How to guides

How to encrypt/decrypt with Enigma

We’ll start with a step-by-step guide to decrypting a known message. You can see the result of these steps in CyberChef here. Let’s say that our message is as follows:

XTSYN WAEUG EZALY NRQIM AMLZX MFUOD AWXLY LZCUZ QOQBQ JLCPK NDDRW F

And that we’ve been told that a German service Enigma is in use with the following settings:

Rotors III, II, and IV, reflector B, ring settings (Ringstellung in German) KNG, plugboard (Steckerbrett)AH CO DE GZ IJ KM LQ NY PS TW, and finally the rotors are set to OPM.

Enigma settings are generally given left-to-right. Therefore, you should ensure the 3-rotor Enigma is selected in the first dropdown menu, and then use the dropdown menus to put rotor III in the 1st rotor slot, II in the 2nd, and IV in the 3rd, and pick B in the reflector slot. In the ring setting and initial value boxes for the 1st rotor, put K and O respectively, N and P in the 2nd, and G and M in the 3rd. Copy the plugboard settings AH CO DE GZ IJ KM LQ NY PS TW into the plugboard box. Finally, paste the message into the input window.

The output window will now read as follows:

HELLO CYBER CHEFU SERST HISIS ATEST MESSA GEFOR THEDO CUMEN TATIO N

The Enigma machine doesn’t support any special characters, so there’s no support for spaces, and by default unsupported characters are removed and output is put into the traditional five-character groups. (You can turn this off by disabling “strict input”.) In some messages you may see X used to represent space.

Encrypting with Enigma is exactly the same as decrypting – if you copy the decrypted message back into the input box with the same recipe, you’ll get the original ciphertext back.

Plugboard, rotor and reflector specifications

The plugboard exchanges pairs of letters, and is specified as a space-separated list of those pairs. For example, with the plugboard AB CD, A will be exchanged for B and vice versa, C for D, and so forth. Letters that aren’t specified are not exchanged, but you can also specify, for example, AA to note that A is not exchanged. A letter cannot be exchanged more than once. In standard late-war German military operating practice, ten pairs were used.

You can enter your own components, rather than using the standard ones. A rotor is an arbitrary mapping between letters – the rotor specification used here is the letters the rotor maps A through Z to, so for example with the rotor ESOVPZJAYQUIRHXLNFTGKDCMWB, A maps to E, B to S, and so forth. Each letter must appear exactly once. Additionally, rotors have a defined step point (the point or points in the rotor’s rotation at which the neighbouring rotor is stepped) – these are specified using a < followed by the letters at which the step happens.

Reflectors, like the plugboard, exchange pairs of letters, so they are entered the same way. However, letters cannot map to themselves.

How to encrypt/decrypt with Typex

The Typex machine is very similar to Enigma. There are a few important differences from a user perspective:

  • Five rotors are used.
  • Rotor wirings cores can be inserted into the rotors backwards.
  • The input plugboard (on models which had one) is more complicated, allowing arbitrary letter mappings, which means it functions like, and is entered like, a rotor.
  • There was an additional plugboard which allowed rewiring of the reflector: this is supported by simply editing the specified reflector.

Like Enigma, Typex only supports enciphering/deciphering the letters A-Z. However, the keyboard was marked up with a standardised way of representing numbers and symbols using only the letters. You can enable emulation of these keyboard modes in the operation configuration. Note that this needs to know whether the message is being encrypted or decrypted.

How to attack Enigma using the Bombe

Let’s take the message from the first example, and try and decrypt it without knowing the settings in advance. Here’s the message again:

XTSYN WAEUG EZALY NRQIM AMLZX MFUOD AWXLY LZCUZ QOQBQ JLCPK NDDRW F

Let’s assume to start with that we know the rotors used were III, II, and IV, and reflector B, but that we know no other settings. Put the ciphertext in the input window and the Bombe operation in your recipe, and choose the correct rotors and reflector. We need one additional piece of information to attack the message: a “crib”. This is a section of known plaintext for the message. If we know something about what the message is likely to contain, we can guess possible cribs.

We can also eliminate some cribs by using the property that Enigma cannot encipher a letter as itself. For example, let’s say our first guess for a crib is that the message begins with “Hello world”. If we enter HELLO WORLD into the crib box, it will inform us that the crib is invalid, as the W in HELLO WORLD corresponds to a W in the ciphertext. (Note that spaces in the input and crib are ignored – they’re included here for readability.) You can see this in CyberChef here

Let’s try “Hello CyberChef” as a crib instead. If we enter HELLO CYBER CHEF, the operation will run and we’ll be presented with some information about the run, followed by a list of stops. You can see this here. Here you’ll notice that it says Bombe run on menu with 0 loops (2+ desirable)., and there are a large number of stops listed. The menu is built from the crib you’ve entered, and is a web linking ciphertext and plaintext letters. (If you’re maths inclined, this is a graph where letters – plain or ciphertext – are nodes and states of the Enigma machine are edges.) The machine performs better on menus which have loops in them – a letter maps to another to another and eventually returns to the first – and additionally on longer menus. However, menus that are too long risk failing because the Bombe doesn’t simulate the middle rotor stepping, and the longer the menu the more likely this is to have happened. Getting a good menu is a mixture of art and luck, and you may have to try a number of possible cribs before you get one that will produce useful results.

Bombe menu diagram

In this case, if we extend our crib by a single character to HELLO CYBER CHEFU, we get a loop in the menu (that U maps to a Y in the ciphertext, the Y in the second cipher block maps to A, the A in the third ciphertext block maps to E, and the E in the second crib block maps back to U). We immediately get a manageable number of results. You can see this here. Each result gives a set of rotor initial values and a set of identified plugboard wirings. Extending the crib further to HELLO CYBER CHEFU SER produces a single result, and it has also recovered eight of the ten plugboard wires and identified four of the six letters which are not wired. You can see this here.

We now have two things left to do:

  1. Recover the remaining plugboard settings.
  2. Recover the ring settings.

This will need to be done manually.

Set up an Enigma operation with these settings. Leave the ring positions set to A for the moment, so from top to bottom we have rotor III at initial value E, rotor II at C, and rotor IV at G, reflector B, and plugboard DE AH BB CO FF GZ LQ NY PS RR TW UU.

You can see this here. You will immediately notice that the output is not the same as the decryption preview from the Bombe operation! Only the first three characters – HEL – decrypt correctly. This is because the middle rotor stepping was ignored by the Bombe. You can correct this by adjusting the ring position and initial value on the right-hand rotor in sync. They are currently A and G respectively. Advance both by one to B and H, and you’ll find that now only the first two characters decrypt correctly.

Keep trying settings until most of the message is legible. You won’t be able to get the whole message correct, but for example at F and L, which you can see here, our message now looks like:

HELLO CYBER CHEFU SERTC HVSJS QTEST KESSA GEFOR THEDO VUKEB TKMZM T

At this point we can recover the remaining plugboard settings. The only letters which are not known in the plugboard are J K V X M I, of which two will be unconnected and two pairs connected. By inspecting the ciphertext and partially decrypted plaintext and trying pairs, we find that connecting IJ and KM results, as you can see here, in:

HELLO CYBER CHEFU SERST HISIS ATEST MESSA GEFOR THEDO CUMEO TMKZK T

This is looking pretty good. We can now fine tune our ring settings. Adjusting the right-hand rotor to G and M gives, as you can see here,

HELLO CYBER CHEFU SERST HISIS ATEST MESSA GEFOR THEDO CUMEN WMKZK T

which is the best we can get with only adjustments to the first rotor. You now need to adjust the second rotor. Here, you’ll find that anything from D and F to Z and B gives the correct decryption, for example here. It’s not possible to determine the exact original settings from only this message. In practice, for the real Enigma and real Bombe, this step was achieved via methods that exploited the Enigma network operating procedures, but this is beyond the scope of this document.

What if I don’t know the rotors?

You’ll need the “Multiple Bombe” operation for this. You can define a set of rotors to choose from – the standard WW2 German military Enigma configurations are provided or you can define your own – and it’ll run the Bombe against every possible combination. This will take up to a few hours for an attack against every possible configuration of the four-rotor Naval Enigma! You should run a single Bombe first to make sure your menu is good before attempting a multi-Bombe run.

You can see an example of using the Multiple Bombe operation to attack the above example message without knowing the rotor order in advance here.

What if I get far too many stops?

Use a longer or different crib. Try to find one that produces loops in the menu.

What if I get no stops, or only incorrect stops?

Are you sure your crib is correct? Try alternative cribs.

What if I know my crib is right, but I still don’t get any stops?

The middle rotor has probably stepped during the encipherment of your crib. Try a shorter or different crib.

How things work

How Enigma works

We won’t go into the full history of Enigma and all its variants here, but as a brief overview of how the machine works:

Enigma uses a series of letter->letter conversions to produce ciphertext from plaintext. It is symmetric, such that the same series of operations on the ciphertext recovers the original plaintext.

The bulk of the conversions are implemented in “rotors”, which are just an arbitrary mapping from the letters A-Z to the same letters in a different order. Additionally, to enforce the symmetry, a reflector is used, which is a symmetric paired mapping of letters (that is, if a given reflector maps X to Y, the converse is also true). These are combined such that a letter is mapped through three different rotors, the reflector, and then back through the same three rotors in reverse.

To avoid Enigma being a simple Caesar cipher, the rotors rotate (or “step”) between enciphering letters, changing the effective mappings. The right rotor steps on every letter, and additionally defines a letter (or later, letters) at which the adjacent (middle) rotor will be stepped. Likewise, the middle rotor defines a point at which the left rotor steps. (A mechanical issue known as the double-stepping anomaly means that the middle rotor actually steps twice when the left hand rotor steps.)

The German military Enigma adds a plugboard, which is a configurable pair mapping of letters (similar to the reflector, but not requiring that every letter is exchanged) applied before the first rotor (and thus also after passing through all the rotors and the reflector).

It also adds a ring setting, which allows the stepping point to be adjusted.

Later in the war, the Naval Enigma added a fourth rotor. This rotor does not step during operation. (The fourth rotor is thinner than the others, and fits alongside a thin reflector, meaning this rotor is not interchangeable with the others on a real Enigma.)

There were a number of other variants and additions to Enigma which are not currently supported here, as well as different Enigma networks using the same basic hardware but different rotors (which are supported by supplying your own rotor configurations).

How Typex works

Typex is a clone of Enigma, with a few changes implemented to improve security. It uses five rotors rather than three, and the rightmost two are static. Each rotor has more stepping points. Additionally, the rotor design is slightly different: the wiring for each rotor is in a removable core, which sits in a rotor housing that has the ring setting and stepping notches. This means each rotor has the same stepping points, and the rotor cores can be inserted backwards, effectively doubling the number of rotor choices.

Later models (from the Mark 22, which is the variant we simulate here) added two plugboards: an input plugboard, which allowed arbitrary letter mappings (rather than just pair switches) and thus functioned similarly to a configurable extra static rotor, and a reflector plugboard, which allowed rewiring the reflector.

How the Bombe works

The Bombe is a mechanism for efficiently testing and discarding possible rotor positions, given some ciphertext and known plaintext. It exploits the symmetry of Enigma and the reciprocal (pairwise) nature of the plugboard to do this regardless of the plugboard settings. Effectively, the machine makes a series of guesses about the rotor positions and plugboard settings and for each guess it checks to see if there are any contradictions (e.g. if it finds that, with its guessed settings, the letter A would need to be connected to both B and C on the plugboard, that’s impossible, and these settings cannot be right). This is implemented via careful connection of electrical wires through a group of simulated Enigma machines.

A full explanation of the Bombe’s operation is beyond the scope of this document – you can read the source code, and the authors also recommend Graham Ellsbury’s Bombe explanation, which is very clearly diagrammed.

Implementation in CyberChef

Enigma/Typex

Enigma and Typex were implemented from documentation of their functionality.

Enigma rotor and reflector settings are from GCHQ’s documentation of known Enigma wirings. We currently simulate all basic versions of the German Service Enigma; most other versions should be possible by manually entering the rotor wirings. There are a few models of Enigma, or attachments for the Service Enigma, which we don’t currently simulate. The operation was tested against some of GCHQ’s working examples of Enigma machines. Output should be letter-for-letter identical to a real German Service Enigma. Note that some Enigma models used numbered rather than lettered rotors – we’ve chosen to stick with the easier-to-use lettered rotors.

There were a number of different Typex versions over the years. We implement the Mark 22, which is backwards compatible with some (but not completely with all, as some early variants supported case sensitivity) older Typex models. GCHQ also has a partially working Mark 22 Typex. This was used to test the plugboards and mechanics of the machine. Typex rotor settings were changed regularly, and none have ever been published, so a test against real rotors was not possible. An example set of rotors have been randomly generated for use in the Typex operation. Some additional information on the internal functionality was provided by the Bombe Rebuild Project.

The Bombe

The Bombe was likewise implemented on the basis of documentation of the attack and the machine. The Bombe Rebuild Project at the National Museum of Computing answered a number of technical questions about the machine and its operating procedures, and helped test our results against their working hardware Bombe, for which the authors would like to extend our thanks.

Constructing menus from cribs in a manner that most efficiently used the Bombe hardware was another difficult step of operating the real Bombes. We have chosen to generate the menu automatically from the provided crib, ignore some hardware constraints of the real Bombe (e.g. making best use of the number of available Enigmas in the Bombe hardware; we simply simulate as many as are necessary), and accept that occasionally the menu selected automatically may not always be the optimal choice. This should be rare, and we felt that manual menu creation would be hard to build an interface for, and would add extra barriers to users experimenting with the Bombe.

The output of the real Bombe is optimised for manual verification using the checking machine, and additionally has some quirks (the rotor wirings are rotated by, depending on the rotor, between one and three steps compared to the Enigma rotors). Therefore, the output given is the ring position, and a correction depending on the rotor needs to be applied to the initial value, setting it to W for rotor V, X for rotor IV, and Y for all other rotors. We felt that this would require too much explanation in CyberChef, so the output of CyberChef’s Bombe operation is the initial value for each rotor, with the ring positions set to A, required to decrypt the ciphertext starting at the beginning of the crib. The actual stops are the same. This would not have caused problems at Bletchley Park, as operators working with the Bombe would never have dealt with a real or simulated Enigma, and vice versa.

By default the checking machine is run automatically and stops which fail silently discarded. This can be disabled in the operation configuration, which will cause it to output all stops from the actual Bombe hardware instead. (In this case you only get one stecker pair, rather than the set identified by the checking machine.)

Optimisation

A three-rotor Bombe run (which tests 17,576 rotor positions and takes about 15-20 minutes on original Turing Bombe hardware) completes in about a fifth of a second in our tests. A four-rotor Bombe run takes about 5 seconds to try all 456,976 states. This also took about 20 minutes on the four-rotor US Navy Bombe (which rotates about 30 times faster than the Turing Bombe!). CyberChef operations run single-threaded in browser JavaScript.

We have tried to remain fairly faithful to the implementation of the real Bombe, rather than a from-scratch implementation of the underlying attack. There is one small deviation from “correct” behaviour: the real Bombe spins the slow rotor on a real Enigma fastest. We instead spin the fast rotor on an Enigma fastest. This means that all the other rotors in the entire Bombe are in the same state for the 26 steps of the fast rotor and then step forward: this means we can compute the 13 possible routes through the lower two/three rotors and reflector (symmetry means there are only 13 routes) once every 26 ticks and then save them. This does not affect where the machine stops, but it does affect the order in which those stops are generated.

The fast rotors repeat each others’ states: in the 26 steps of the fast rotor between steps of the middle rotor, each of the scramblers in the complete Bombe will occupy each state once. This means we can once again store each state when we hit them and reuse them when the other scramblers rotate through the same states.

Note also that it is not necessary to complete the energisation of all wires: as soon as 26 wires in the test register are lit, the state is invalid and processing can be aborted.

The above simplifications reduce the runtime of the simulation by an order of magnitude.

If you have a large attack to run on a multiprocessor system – for example, the complete M4 Naval Enigma, which features 1344 possible choices of rotor and reflector configuration, each of which takes about 5 seconds – you can open multiple CyberChef tabs and have each run a subset of the work. For example, on a system with four or more processors, open four tabs with identical Multiple Bombe recipes, and set each tab to a different combination of 4th rotor and reflector (as there are two options for each). Leave the full set of eight primary rotors in each tab. This should complete the entire run in about half an hour on a sufficiently powerful system.

To celebrate their centenary, GCHQ have open-sourced very-faithful reimplementations of Enigma, Typex, and Bombe that you can run in your browser. That’s pretty cool, and a really interesting experimental toy for budding cryptographers and cryptanalysts!

Logitech MX Master 2S

I’m a big believer in the idea that the hardware I lay my hands on all day, every day, needs to be the best for its purpose. On my primary desktop, I type on a Das Keyboard 4 Professional (with Cherry MX brown switches) because it looks, feels, and sounds spectacular. I use the Mac edition of the keyboard because it, coupled with a few tweaks, gives me the best combination of features and compatibility across all of the Windows, MacOS, and Linux (and occasionally other operating systems) I control with it. These things matter.

F13, Lower Brightness, and Raise Brightness keys on Dan's keyboard
I don’t know what you think these buttons do, but if you’re using my keyboard, you’re probably wrong. Also, they need a clean. I swear they don’t look this grimy when I’m not doing macro-photography.

I also care about the mouse I use. Mice are, for the most part, for the Web and for gaming and not for use in most other applications (that’s what keyboard shortcuts are for!) but nonetheless I spend plenty of time holding one and so I want the tool that feels right to me. That’s why I was delighted when, in replacing my four year-old Logitech MX1000 in 2010 with my first Logitech Performance MX, I felt able to declare it the best mouse in the world. My Performance MX lived for about four years, too – that seems to be how long a mouse can stand the kind of use that I give it – before it started to fail and I opted to replace it with an identical make and model. I’d found “my” mouse, and I was sticking with it. It’s a great shape (if you’ve got larger hands), is full of features including highly-configurable buttons, vertical and horizontal scrolling (or whatever you want to map them to), and a cool “flywheel” mouse wheel that can be locked to regular operation or unlocked for controlled high-speed scrolling at the touch of a button: with practice, you can even use it as a speed control by gently depressing the switch like it was a brake pedal. Couple all of that with incredible accuracy on virtually any surface, long battery life, and charging “while you use” and you’ve a recipe for success, in my mind.

My second Performance MX stopped properly charging its battery this week, and it turns out that they don’t make them any more, so I bought its successor, the Logitech MX Master 2S.

(New) Logitech MX Master 2S and (old) Logitech Performance MX
On the left, the (new) Logitech MX Master 2S. On the right, my (old) Logitech Performance MX.

The MX Master 2S is… different… from its predecessor. Mostly in good ways, sometimes less-good. Here’s the important differences:

  • Matte coating: only the buttons are made of smooth plastic; the body of the mouse is now a slightly coarser plastic: you’ll see in the photo above how much less light it reflects. It feels like it would dissipate heat less-well.
  • Horizontal wheel replaces rocker wheel: instead of the Performance MX’s “rocker” scroll wheel that can be pushed sideways for horizontal scroll, the MX Master 2S adds a dedicated horizontal scroll (or whatever you reconfigure it to) wheel in the thumb well. This is a welcome change: the rocker wheel in both my Performance MXes became less-effective over time and in older mice could even “jam on”, blocking the middle-click function. This seems like a far more-logical design.
  • New back/forward button shape: to accommodate the horizontal wheel, the “back” and “forward” buttons in the thumb well have been made smaller and pushed closer together. This is the single biggest failing of the MX Master 2S: it’s clearly a mouse designed for larger hands, and yet these new buttons are slightly, but noticeably, harder to accurately trigger with a large thumb! It’s tolerable, but slightly annoying.
  • Bluetooth support: one of my biggest gripes about the Performance MX was its dependence on Unifying, Logitech’s proprietary wireless protocol. The MX Master 2S supports Unifying but also supports Bluetooth, giving you the best of both worlds.
  • Digital flywheel: the most-noticable change when using the mouse is the new flywheel and braking mechanism, which is comparable to the change in contemporary cars from a mechanical to a digital handbrake. The flywheel “lock” switch is now digital, turning on or off the brake in a single stroke and so depriving you of the satisfaction of using it to gradually “slow down” a long spin-scroll through an enormous log or source code file. But in exchange comes an awesome feature called SmartShift, which dynamically turns on or off the brake (y’know, like an automatic handbrake!) depending on the speed with which you throw the wheel. That’s clever and intuitive and “just works” far better than I’d have imagined: I can choose to scroll slowly or quickly, with or without the traditional ratchet “clicks” of a wheel mouse, with nothing more than the way I flick my finger (and all fully-configurable, of course). And I’ve still got the button to manually “toggle” the brake if I need it. It took some getting used to, but this change is actually really cool! (I’m yet to get used to the sound of the digital brake kicking in automatically, but that’s true of my car too).
  • Basic KVM/multi-computing capability: with a button on the underside to toggle between different paired Unifying/Bluetooth transceivers and software support for seamless edge-of-desktop multi-computer operation, Logitech are clearly trying to target folks who, like me, routinely run multiple computers simultaneously from a single keyboard and mouse. But it’s a pointless addition in my case because I’ve been quite happy using Synergy to do this for the last 7+ years, which does it better. Still, it’s a harmless “bonus” feature and it might be of value to others, I suppose.

All in all, the MX Master 2S isn’t such an innovative leap forward over the Performance MX as the Performance MX was over the MX1000, but it’s still great that this spectacular series of heavyweight workhouse feature-rich mice continues to innovate and, for the most part, improve upon the formula. This mouse isn’t cheap, and it isn’t for everybody, but if you’re a big-handed power user with a need to fine-tune all your hands-on hardware to get it just right, it’s definitely worth a look.

F13, Lower Brightness, and Raise Brightness keys on Dan's keyboard× (New) Logitech MX Master 2S and (old) Logitech Performance MX×

Debugging WorldWideWeb

Earlier this week, I mentioned the exciting hackathon that produced a moderately-faithful reimagining of the world’s first Web browser. I was sufficiently excited about it that I not only blogged here but I also posted about it to MetaFilter. Of course, the very first thing that everybody there did was try to load MetaFilter in it, which… didn’t work.

MetaFilter failing to load on the reimagined WorldWideWeb.
500? Really?

People were quick to point this out and assume that it was something to do with the modernity of MetaFilter:

honestly, the disheartening thing is that many metafilter pages don’t seem to work. Oh, the modern web.

Some even went so far as to speculate that the reason related to MetaFilter’s use of CSS and JS:

CSS and JS. They do things. Important things.

This is, of course, complete baloney, and it’s easy to prove to oneself. Firstly, simply using the View Source tool in your browser on a MetaFilter page reveals source code that’s quite comprehensible, even human-readable, without going anywhere near any CSS or JavaScript.

MetaFilter in Lynx: perfectly usable browing experience
As late as the early 2000s I’d occasionally use Lynx for serious browsing, but any time I’ve used it since it’s been by necessity.

Secondly, it’s pretty simple to try browsing MetaFilter without CSS or JavaScript enabled! I tried in two ways: first, by using Lynx, a text-based browser that’s never supported either of those technologies. I also tried by using Firefox but with them disabled (honestly, I slightly miss when the Web used to look like this):

MetaFilter in Firefox (with CSS and JS disabled)
It only took me three clicks to disable stylesheets and JavaScript in my copy of Firefox… but I’ll be the first to admit that I don’t keep my browser configured like “normal people” probably do.

And thirdly: the error code being returned by the simulated WorldWideWeb browser is a HTTP code 500. Even if you don’t know your HTTP codes (I mean, what kind of weirdo would take the time to memorise them all anyway <ahem>), it’s worth learning this: the first digit of a HTTP response code tells you what happened:

  • 1xx means “everything’s fine, keep going”;
  • 2xx means “everything’s fine and we’re done”;
  • 3xx means “try over there”;
  • 4xx means “you did something wrong” (the infamous 404, for example, means you asked for a page that doesn’t exist);
  • 5xx means “the server did something wrong”.

Simple! The fact that the error code begins with a 5 strongly implies that the problem isn’t in the (client-side) reimplementation of WorldWideWeb: if this had have been a CSS/JS problem, I’d expect to see a blank page, scrambled content, “filler” content, or incomplete content.

So I found myself wondering what the real problem was. This is, of course, where my geek flag becomes most-visible: what we’re talking about, let’s not forget, is a fringe problem in an incomplete simulation of an ancient computer program that nobody uses. Odds are incredibly good that nobody on Earth cares about this except, right now, for me.

Dan's proposed "Geek Flag"
I searched for a “Geek Flag” and didn’t like anything I saw, so I came up with this one based on… well, if you recognise what it’s based on, good for you, you’re certainly allowed to fly it. If not… well, you can too: there’s no geek-gatekeeping here.

Luckily, I spotted Jeremy’s note that the source code for the WorldWideWeb simulator was now available, so I downloaded a copy to take a look. Here’s what’s happening:

  1. The (simulated) copy of WorldWideWeb is asked to open a document by reference, e.g. “https://www.metafilter.com/”.
  2. To work around same-origin policy restrictions, the request is sent to an API which acts as a proxy server.
  3. The API makes a request using the Node package “request” with this line of code: request(url, (error, response, body) => { ... }).  When the first parameter to request is a (string) URL, the module uses its default settings for all of the other options, which means that it doesn’t set the User-Agent header (an optional part of a Web request where the computer making the request identifies the software that’s asking).
  4. MetaFilter, for some reason, blocks requests whose User-Agent isn’t set. This is weird! And nonstandard: while web browsers should – in RFC2119 terms – set their User-Agent: header, web servers shouldn’t require that they do so. MetaFilter returns a 403 and a message to say “Forbidden”; usually a message you only see if you’re trying to access a resource that requires session authentication and you haven’t logged-in yet.
  5. The API is programmed to handle response codes 200 (okay!) and 404 (not found), but if it gets anything else back it’s supposed to throw a 400 (bad request). Except there’s a bug: when trying to throw a 400, it requires that an error message has been set by the request module and if there hasn’t… it instead throws a 500 with the message “Internal Server Fangle” and  no clue what actually went wrong. So MetaFilter’s 403 gets translated by the proxy into a 400 which it fails to render because a 403 doesn’t actually produce an error message and so it gets translated again into the 500 that you eventually see. What a knock-on effect!
Illustration showing conversation between simulated WorldWideWeb and MetaFilter via an API that ultimately sends requests without a User-Agent, gets a 403 in response, and can't handle the 403 and so returns a confusing 500.
If you’re having difficulty visualising the process, this diagram might help you to continue your struggle with that visualisation.

The fix is simple: simply change the line:

request(url, (error, response, body) => { ... })

to:

request({ url: url, headers: { 'User-Agent': 'WorldWideWeb' } }, (error, response, body) => { ... })

This then sets a User-Agent header and makes servers that require one, such as MetaFilter, respond appropriately. I don’t know whether WorldWideWeb originally set a User-Agent header (CERN’s source file archive seems to be missing the relevant C sources so I can’t check) but I suspect that it did, so this change actually improves the fidelity of the emulation as a bonus. A better fix would also add support for and appropriate handling of other HTTP response codes, but that’s a story for another day, I guess.

I know the hackathon’s over, but I wonder if they’re taking pull requests…

MetaFilter failing to load on the reimagined WorldWideWeb.× MetaFilter in Lynx: perfectly usable browing experience× MetaFilter in Firefox (with CSS and JS disabled)× Dan's proposed "Geek Flag"× Illustration showing conversation between simulated WorldWideWeb and MetaFilter via an API that ultimately sends requests without a User-Agent, gets a 403 in response, and can't handle the 403 and so returns a confusing 500.×

WorldWideWeb, 30 years on

This month, a collection of some of my favourite geeks got invited to CERN in Geneva to participate in a week-long hackathon with the aim of reimplementing WorldWideWeb – the first web browser, circa 1990-1994 – as a web application. I’m super jealous, but I’m also really pleased with what they managed to produce.

DanQ.me as displayed by the reimagined WorldWideWeb browser circa 1990
With the exception of a few character entity quirks, this site remains perfectly usable in the simulated WorldWideWeb browser. Clearly I wasn’t the only person to try this vanity-check…

This represents a huge leap forward from their last similar project, which aimed to recreate the line mode browser: the first web browser that didn’t require a NeXT computer to run it and so a leap forward in mainstream appeal. In some ways, you might expect reimplementing WorldWideWeb to be easier, because its functionality is more-similar that of a modern browser, but there were doubtless some challenges too: this early browser predated the concept of the DOM and so there are distinct processing differences that must be considered to get a truly authentic experience.

Geeks hacking on WorldWideWeb reborn
It’s just like any other hackathon, if you ignore the enormous particle collider underneath it.

Among their outputs, the team also produced a cool timeline of the Web, which – thanks to some careful authorship – is as legible in WorldWideWeb as it is in a modern browser (if, admittedly, a little less pretty).

WorldWideWeb screenshot by Sir Tim Berners-Lee
When Sir Tim took this screenshot, he could never have predicted the way the Web would change, technically, over the next 25-30 years. But I’m almost more-interested in how it’s stayed the same.

In an age of increasing Single Page Applications and API-driven sites and “apps”, it’s nice to be reminded that if you develop right for the Web, your content will be visible (sort-of; I’m aware that there are some liberties taken here in memory and processing limitations, protocols and negotiation) on machines 30 years old, and that gives me hope that adherence to the same solid standards gives us a chance of writing pages today that look just as good in 30 years to come. Compare that to a proprietary technology like Flash whose heyday 15 years ago is overshadowed by its imminent death (not to mention Java applets or ActiveX <shudders>), iOS apps which stopped working when the operating system went 64-bit, and websites which only work in specific browsers (traditionally Internet Explorer, though as I’ve complained before we’re getting more and more Chrome-only sites).

The Web is a success story in open standards, natural and by-design progressive enhancement, and the future-proof archivability of human-readable code. Long live the Web.

Update 24 February 2019: After I submitted news of the browser to MetaFilter, I (and others) spotted a bug. So I came up with a fix…

DanQ.me as displayed by the reimagined WorldWideWeb browser circa 1990× Geeks hacking on WorldWideWeb reborn× WorldWideWeb screenshot by Sir Tim Berners-Lee×

The Route of a Text Message

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

With each tap, a small electrical current passes from the screen to her hand. Because electricity flows easily through human bodies, sensors on the phone register a change in voltage wherever her thumb presses against the screen. But the world is messy, and the phone senses random fluctuations in voltage across the rest of the screen, too, so an algorithm determines the biggest, thumbiest-looking voltage fluctuations and assumes that’s where she intended to press.

Figure 0. Capacitive touch.

So she starts tap-tap-tapping on the keyboard, one letter at a time.

I-spacebar-l-o-v-e-spacebar-y-o-u.

I’ve long been a fan of “full story” examinations of how technology works. This one looks and the sending and receipt of an SMS text message from concept through touchscreen, encoding and transmission, decoding and display. It’s good to be reminded that whatever technology you build, even a “basic” Arduino project, a “simple” website or a “throwaway” mobile app, you’re standing on the shoulders of giants. Your work sits atop decades or more of infrastructure, standards, electronics and research.

Sometimes it feels pretty fragile. But mostly it feels like magic.

Additional Processors

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Computerphile at its best, here tackling the topic of additional (supplementary) processors, like FPUs, GPUs, sound processors, etc., to which CPUs outsource some of their work under specific circumstances. Even speaking as somebody who’s upgraded a 386/SX to a 386/DX through the addition of a “math co-processor” (an FPU) and seeing the benefit in applications for which floating point arithmetic was a major part (e.g. some early 3D games), I didn’t really think about what was really happening until I saw this video. There’s always more to learn, fellow geeks!

Edge may be becoming Chromium-powered, and that’s terrible

Microsoft engineers have been spotted committing code to Chromium, the backend of Google Chrome and many other web browsers. This, among other things, has lead to speculation that Microsoft’s browser, Edge, might be planned to switch from its current rendering engine (EdgeHTML) to Blink (Chromium’s). This is bad news.

This page in Microsoft Edge
This post, as it would appear if you were looking at it in Edge. Which you might be, I suppose.

The younger generation of web developers are likely to hail this as good news: one fewer engine to develop for and test in, they’re all already using Chrome or something similar (and certainly not Edge) for development and debugging anyway, etc. The problem comes perhaps because they’re too young to remember the First Browser War and its aftermath. Let me summarise:

  1. Once upon a time – let’s call it the mid-1990s – there were several web browsers: Netscape Navigator, Internet Explorer, Opera, etc. They all used different rendering engines and so development was sometimes a bit of a pain, but only if you wanted to use the latest most cutting-edge features: if you were happy with the standard, established features of the Web then your site would work anywhere, as has always been the case.
    Best viewed with... any damn browser
  2. Then, everybody starting using just one browser: following some shady dealings and monopoly abuse, 90%+ of Web users started using just one web browser, Internet Explorer. By the time anybody took notice, their rivals had been economically crippled beyond any reasonable chance of recovery, but the worst had yet to come…
    Best viewed with Internet Explorer
  3. Developers started targeting only that one browser: instead of making websites, developers started making “Internet Explorer sites” which were only tested in that one browser or, worse yet, only worked at all in that browser, actively undermining the Web’s position as an open platform. As the grip of the monopoly grew tighter, technological innovation was centred around this single platform, leading to decade-long knock-on effects.
  4. The Web ceased to grow new features: from the release of Internet Explorer 6 there were no significant developments in the technology of the Web for many years. The lack of competition pushed us into a period of stagnation. A decade and a half later, we’re only just (finally) finishing shaking off this unpleasant bit of our history.
    "Netscape sux"

History looks set to repeat itself. Substitute Chrome in place of Internet Explorer and update the references to other web browsers and the steps above could be our future history, too. Right now, we’re somewhere in or around step #2 – Chrome is the dominant browser – and we’re starting to see the beginnings of step #3: more and more “Chrome only” sites. More-alarmingly this time around, Google’s position in providing many major Web services allows them to “push” even harder for this kind of change, even just subtly: if you make the switch from Chrome to e.g. Firefox (and you absolutely should) you might find that YouTube runs slower for you because YouTube’s (Google) engineers favour Google’s web browser.

Chrome is becoming the new Internet Explorer 6, and that’s a huge problem. Rachel Nabors wrote in her excellent article The Ecological Impact of Browser Diversity:

So these are the three browser engines we have: WebKit/Blink, Gecko, and EdgeHTML. We are unlikely to get any brand new bloodlines in the foreseeable future. This is it.

If we lose one of those browser engines, we lose its lineage, every permutation of that engine that would follow, and the unique takes on the Web it could allow for.

And it’s not likely to be replaced.

The Circle of Browsers, by Rachel Nabors

Imagine a planet populated only by hummingbirds, dolphins, and horses. Say all the dolphins died out. In the far, far future, hummingbirds or horses could evolve into something that could swim in the ocean like a dolphin. Indeed, ichthyosaurs in the era of dinosaurs looked much like dolphins. But that creature would be very different from a true dolphin: even ichthyosaurs never developed echolocation. We would wait a very long time (possibly forever) for a bloodline to evolve the traits we already have present in other bloodlines today. So, why is it ok to stand by or even encourage the extinction of one of these valuable, unique lineages?

We have already lost one.

We used to have four major rendering engines, but Opera halted development of its own rendering engine Presto before adopting Blink.

Three left. Spend them wisely.

As much as I don’t like having to work-around the quirks in all of the different browsers I test in, daily, it’s way preferable to a return to the dark days of the Web circa most of the first decade of this century. Please help keep browsers diverse: nobody wants to start seeing this shit –

Best viewed with Google Chrome

Update: this is now confirmed. A sad day for the Web.

This page in Microsoft Edge× Best viewed with Google Chrome×

SHE BON : Sensing the Sensual

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

I don’t know if this is genius or insanity, but either way it’s pretty remarkable. Lost my shit at “pop girl”. Lost it again at “I can poke the nipple”.

It takes a true engineer to think to themselves, “Hey, I wonder if I could use technology to tell me when I’m feeling aroused? That sounds like a useful project.” I love it.