Netscape’s Untold Webstories

I mentioned yesterday that during Bloganuary I’d put non-Bloganuary-prompt post ideas onto the backburner, and considered extending my daily streak by posting them in February. Here’s part of my attempt to do that:

Let’s take a trip into the Web of yesteryear, with thanks to our friends at the Internet Archive’s WayBack Machine.

The page we’re interested in used to live at http://www.netscape.com/comprod/columns/webstories/index.html, and promised to be a showcase for best practice in Web development. Back in October 1996, it looked like this:

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to debut in November."

The page is a placeholder for Netscape Webstories (or Web Site Stories, in some places). It’s part of a digital magazine called Netscape Columns which published pieces written by Marc Andreeson, Jim Barksdale, and other bigwigs in the hugely-influential pre-AOL-acquisition Netscape Communications.

This new series would showcase best practice in designing and building Web sites1, giving a voice to the technical folks best-placed to speak on that topic. That sounds cool!

Those white boxes above and below the paragraph of text aren’t missing images, by the way: they’re horizontal rules, using the little-known size attribute to specify a thickness of <hr size=4>!2

Certainly you’re excited by this new column and you’ll come back in November 1996, right?

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in January."

Oh. The launch has been delayed, I guess. Now it’s coming in January.

The <hr>s look better now their size has been reduced, though, so clearly somebody’s paying attention to the page. But let’s take a moment and look at that page title. If you grew up writing web pages in the modern web, you might anticipate that it’s coded something like this:

<h2 style="font-variant: small-caps; text-align: center;">Coming Soon</h2>

There’s plenty of other ways to get that same effect. Perhaps you prefer font-feature-settings: 'smcp' in your chosen font; that’s perfectly valid. Maybe you’d use margin: 0 auto or something to centre it: I won’t judge.

But no, that’s not how this works. The actual code for that page title is:

<center>
  <h2>
    <font size="+3">C</font>OMING
    <font size="+3">S</font>OON
  </h2>
</center>

Back when this page was authored, we didn’t have CSS3. The only styling elements were woven right in amongst the semantic elements of a page4. It was simple to understand and easy to learn… but it was a total mess5.

Anyway, let’s come back in January 1997 and see what this feature looks like when it’s up-and-running.

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in the spring."

Nope, now it’s pushed back to “the spring”.

Under Construction pages were all the rage back in the nineties. Everybody had one (or several), usually adorned with one or more of about a thousand different animated GIFs for that purpose.6

Rotating animated "under construction" banner.

Building “in public” was an act of commitment, a statement of intent, and an act of acceptance of the incompleteness of a digital garden. They’re sort-of coming back into fashion in the interpersonal Web, with the “garden and stream” metaphor7 taking root. This isn’t anything new, of course – Mark Bernstein touched on the concepts in 1998 – but it’s not something that I can ever see returning to the “serious” modern corporate Web: but if you’ve seen a genuine, non-ironic “under construction” page published to a non-root page of a company’s website within the last decade, please let me know!

Under construction banner with an animated yellow-and-black tape banner between two "men at work" signs.

RSS doesn’t exist yet (although here’s a fun fact: the very first version of RSS came out of Netscape!). We’re just going to have to bookmark the page and check back later in the year, I guess…

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page identical to the previous version but with a search box ("To search the Netscape Columns, type a word or phrase here:") beneath.

Okay, so February clearly isn’t Spring, but they’ve updated the page… to add a search form.

It’s a genuine <form> tag, too, not one of those old-fashioned <isindex> tags you’d still sometimes find even as late as 1997. Interestingly, it specifies enctype="application/x-www-form-urlencoded". Today’s developers probably don’t think about the enctype attribute except when they’re doing a form that handles file uploads and they know they need to switch it to enctype="multipart/form-data", (or their framework does this automatically for them!).

But these aren’t the only options, and some older browsers at this time still defaulted to enctype="text/plain".  So long as you’re using a POST and not GET method, the distinction is mostly academic, but if your backend CGI program anticipates that special characters will come-in encoded, back then you’d be wise to specify that you wanted URL-encoding or you might get a nasty surprise when somebody turns up using LMB or something equally-exotic.

Anyway, let’s come back in June. The content must surely be up by now:

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in August."

Oh come on! Now we’re waiting until August?

At least the page isn’t abandoned. Somebody’s coming back and editing it from time to time to let us know about the still-ongoing series of delays. And that’s not a trivial task: this isn’t a CMS. They’re probably editing the .html file itself in their favourite text editor, then putting the appropriate file:// address into their copy of Netscape Navigator (and maybe other browsers) to test it, then uploading the file – probably using FTP – to the webserver… all the while thanking their lucky stars that they’ve only got the one page they need to change.

We didn’t have scripting languages like PHP yet, you see8. We didn’t really have static site generators. Most servers didn’t implement server-side includes. So if you had to make a change to every page on a site, for example editing the main navigation menu, you’d probably have to open and edit dozens or even hundreds of pages. Little wonder that framesets caught on, despite their (many) faults, with their ability to render your navigation separately from your page content.

Okay, let’s come back in August I guess:

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in the spring." Again.

Now we’re told that we’re to come back… in the Spring again? That could mean Spring 1998, I suppose… or it could just be that somebody accidentally re-uploaded an old copy of the page.

Hey: the footer’s gone too? This is clearly a partial re-upload: somebody realised they were accidentally overwriting the page with the previous-but-one version, hit “cancel” in their FTP client (or yanked the cable out of the wall), and assumed that they’d successfully stopped the upload before any damage was done.

They had not.

Screenshot of a Windows 95 dialog box, asking "Are you sure you want to delete index.html?" The cursor hovers over the "Yes" button.

I didn’t mention that top menu, did I? It looks like it’s a series of links, styled to look like flat buttons, right? But you know that’s not possible because you can’t rely on having the right fonts available: plus you’d have to do some <table> trickery to lay it out, at which point you’d struggle to ensure that the menu was the same width as the banner above it. So how did they do it?

The menu is what’s known as a client-side imagemap. Here’s what the code looks like:

<a href="/comprod/columns/images/nav.map">
  <img src="/comprod/columns/images/websitestories_ban.gif" width=468 height=32 border=0 usemap="#maintopmap" ismap>
</a><map name="mainmap">
  <area coords="0,1,92,24" href="/comprod/columns/mainthing/index.html">
  <area coords="94,1,187,24" href="/comprod/columns/techvision/index.html">
  <area coords="189,1,278,24" href="/comprod/columns/webstories/index.html">
  <area coords="280,1,373,24" href="/comprod/columns/intranet/index.html">
  <area coords="375,1,467,24" href="/comprod/columns/newsgroup/index.html">
</map>

The image (which specifies border=0 because back then the default behaviour for graphical browser was to put a thick border around images within hyperlinks) says usemap="#maintopmap" to cross-reference the <map> below it, which defines rectangular areas on the image and where they link to, if you click them! This ingenious and popular approach meant that you could transmit a single image – saving on HTTP round-trips, which were relatively time-consuming before widespread adoption of HTTP/1.1‘s persistent connections – along with a little metadata to indicate which pixels linked to which pages.

The ismap attribute is provided as a fallback for browsers that didn’t yet support client-side image maps but did support server-side image maps: there were a few! When you put ismap on an image within a hyperlink, then when the image is clicked on the href has appended to it a query parameter of the form ?123,456, where those digits refer to the horizontal and vertical coordinates, from the top-left, of the pixel that was clicked on! These could then be decoded by the webserver via a .map file or handled by a CGI program. Server-side image maps were sometimes used where client-side maps were undesirable, e.g. when you want to record the actual coordinates picked in a spot-the-ball competition or where you don’t want to reveal in advance which hotspot leads to what destination, but mostly they were just used as a fallback.9

Both client-side and server-side image maps still function in every modern web browser, but I’ve not seen them used in the wild for a long time, not least because they’re hard (maybe impossible?) to make accessible and they can’t cope with images being resized, but also because nowadays if you really wanted to make an navigation “image” you’d probably cut it into a series of smaller images and make each its own link.

Anyway, let’s come back in October 1997 and see if they’ve fixed their now-incomplete page:

Screenshot from Netscape Columns: Web Site Stories: the Coming Soon page is now laid out in two columns, but the expected launch date has been removed.

Oh, they have! From the look of things, they’ve re-written the page from scratch, replacing the version that got scrambled by that other employee. They’ve swapped out the banner and menu for a new design, replaced the footer, and now the content’s laid out in a pair of columns.

There’s still no reliable CSS, so you’re not looking at columns: (no implementations until 2014) nor at display: flex (2010) here. What you’re looking at is… a fixed-width <table> with a single row and three columns! Yes: three – the middle column is only 10 pixels wide and provides the “gap” between the two columns of text.10

This wasn’t Netscape’s only option, though. Did you ever hear of the <multicol> tag? It was the closest thing the early Web had to a semantically-sound, progressively-enhanced multi-column layout! The author of this page could have written this:

<multicol cols=2 gutter=10 width=301>
  <p>
    Want to create the best possible web site? Join us as we explore the newest
    technologies, discover the coolest tricks, and learn the best secrets for
    designing, building, and maintaining successful web sites.
  </p>
  <p>
    Members of the Netscape web site team, recognized designers, and technical
    experts will share their insights and experiences in Web Site Stories. 
  </p>
</multicol>

That would have given them the exact same effect, but with less code and it would have degraded gracefully. Browsers ignore tags they don’t understand, so a browser without support for <multicol> would have simply rendered the two paragraphs one after the other. Genius!

So why didn’t they? Probably because <multicol> only ever worked in Netscape Navigator.

Introduced in 1996 for version 3.0, this feature was absolutely characteristic of the First Browser War. The two “superpowers”, Netscape and Microsoft, both engaged in unilateral changes to the HTML specification, adding new features and launching them without announcement in order to try to get the upper hand over the other. Both sides would often refuse to implement one-another’s new tags unless they were forced to by widespread adoption by page authors, instead promoting their own competing mechanisms11.

Between adding this new language feature to their browser and writing this page, Netscape’s market share had fallen from around 80% to around 55%, and most of their losses were picked up by IE. Using <multicol> would have made their page look worse in Microsoft’s hot up-and-coming browser, which wouldn’t have helped them persuade more people to download a copy of Navigator and certainly wouldn’t be a good image on a soon-to-launch (any day now!) page about best-practice on the Web! So Netscape’s authors opted for the dominant, cross-platform solution on this page12.

Anyway, let’s fast-forward a bit and see this project finally leave its “under construction” phase and launch!

Screenshot showing the homepage of Netscape Columns from 15 February 1998; the first recorded copy NOT to have a header link to the Webstories / Web Site Stories page.

Oh. It’s gone.

Sometime between October 1997 and February 1998 the long promised “Web Site Stories” section of Netscape Columns quietly disappeared from the website. Presumably, it never published a single article, instead remaining a perpetual “Coming Soon” page right up until the day it was deleted.

I’m not sure if there’s a better metaphor for Netscape’s general demeanour in 1998 – the year in which they finally ceased to be the dominant market leader in web browsers – than the quiet deletion of a page about how Netscape customers are making the best of the Web. This page might not have been important, or significant, or even completed, but its disappearance may represent Netscape’s refocus on trying to stay relevant in the face of existential threat.

Of course, Microsoft won the First Browser War. They did so by pouring a fortune’s worth of developer effort into staying technologically one-step ahead, refusing to adopt standards proposed by their rival, and their unprecedented decision to give away their browser for free13.

Footnotes

1 Yes, we used to write “Web sites” as two words. We also used to consistently capitalise the words Web and Internet. Some of us still do so.

2 In case it’s not clear, this blog post is going to be as much about little-known and archaic Web design techniques as it is about Netscape’s website.

3 This is a white lie. CSS was first proposed almost at the same time as the Web! Microsoft Internet Explorer was first to deliver a partial implementation of the initial standard, late in 1996, but Netscape dragged their heels, perhaps in part because they’d originally backed a competing standard called JavaScript Style Sheets (JSSS). JSSS had a lot going for it: if it had enjoyed widespread adoption, for example, we’d have had the equivalent of CSS variables a full twenty years earlier! In any case, back in 1996 you definitely wouldn’t want to rely on CSS support.

4 Wondering where the text and link colours come from? <body bgcolor="#ffffff" text="#000000" link="#0000ff" vlink="#ff0000" alink="#ff0000">. Yes really, that’s where we used to put our colours.

5 Personally, I really loved the aesthetic Netscape touted when using Times New Roman (or whatever serif font was available on your computer: webfonts weren’t a thing yet) with temporary tweaks to font sizes, and I copied it in some of my own sites. If you look back at my 2018 blog post celebrating two decades of blogging, where I’ve got a screenshot of my blog as it looked circa 1999, you’ll see that I used exactly this technique for the ordinal suffixes on my post dates! On the same post, you’ll see that I somewhat replicated the “feel” of it again in my 2011 design, this time using a stylesheet.

6 There’s a whole section of Cameron’s World dedicated to “under construction” banners, and that’s a beautiful thing!

7 The idea of “garden and stream” is that you publish early and often, refining as you go, in your garden, which can act as an extension of whatever notetaking system you use already, but publish mostly “finished” content to your (chronological) stream. I see an increasing number of IndieWeb bloggers going down this route, but I’m not convinced that it’s for me.

8 Another white lie. PHP was released way back in 1995 and even the very first version supported something a lot like server-side includes, using the syntax <!--include /file/name.html-->. But it was a little computationally-intensive to run willy-nilly.

9 Server-side imagemaps are enjoying a bit of a renaissance on .onion services, whose visitors often keep JavaScript disabled, to make image-based CAPTCHAs. Simply show the visitor an image and describe the bit you want them to click on, e.g. “the blue pentagon with one side missing”, then compare the coordinates of the pixel they click on to the knowledge of the right answer. Highly-inaccessible, of course, but innovative from a purely-technical perspective.

10 Nowadays, use of tables for layout – or, indeed, for anything other than tabular data – is very-much frowned upon: it’s often bad for accessibility and responsive design. But back before we had the features granted to us by the modern Web, it was literally the only way to get content to appear side-by-side on a page, and designers got incredibly creative about how they misused tables to lay out content, especially as browsers became more-sophisticated and began to support cells that spanned multiple rows or columns, tables “nested” within one another, and background images.

11 It was a horrible time to be a web developer: having to make hacky workarounds in order to make use of the latest features but still support the widest array of browsers. But I’d still take that over the horrors of rendering engine monoculture!

12 Or maybe they didn’t even think about it and just copy-pasted from somewhere else on their site. I’m speculating.

13 This turned out to be the master-stroke: not only did it partially-extricate Microsoft from their agreement with Spyglass Inc., who licensed their browser engine to Microsoft in exchange for a percentage of sales value, but once Microsoft started bundling Internet Explorer with Windows it meant that virtually every computer came with their browser factory-installed! This strategy kept Microsoft on top until Firefox and Google Chrome kicked-off the Second Browser War in the early 2010s. But that’s another story.

× × × × × × × × × × ×

[Bloganuary] Uninvention

This post is part of my attempt at Bloganuary 2024. Today’s prompt is:

If you could un-invent something, what would it be?

Fucking cryptocurrency.

Industrial sprawl at sunset: countless tall chimneys belch smoke alongside crisscrossing power lines. In the smoke, the outline of "physical" Bitcoins can be seen.
To preempt the inevitable “well actually”: yes, I’m fully aware that there exist cryptocurrencies that have minimal environmental impact. I concede that those cryptocurrencies might only have all the other problems. Stop talking to me about how great you think Ripple is.

I remember when Bitcoin first appeared. A currency based on a ledger recorded in a shared blockchain sounded pretty cool from a technological standpoint, and so – as a technology enthusiast – I experimented with it.

I recall that I bought a couple of Bitcoin; I think they were about 50 pence each? It seemed like a “toy” currency; nothing that would ever attract any mainstream attention. After all: why would it? It’s less-anonymous than cash. It’s less-convenient than cards. It’s (even) less-widely-accepted than cheques. It somehow manages to be somehow slower than everything. And crucially, without any government backing it can’t be used to settle a debt or pay your taxes1. The technology was interesting to me, but it had no real-world application.

Screengrab from BEEF Series 1, Episode 1, showing a held mobile phone showing a Bitcoin wallet's value crashing by 87%
When a conventional currency does something like this, we call it a catastrophe. When a cryptocurrency does it, we call it a Thursday.

Imagine my surprise when people started investing in the cryptocurrency. Began accepting it in payment for things. I know a tulip economy when I see one, I figured, so I got rid of my “toy” Bitcoins when the price hit around £750 each2. Sure, it’d have been “smarter” to wait until it hit £45,000 each, but I genuinely thought the bubble was going to burst and, besides, I’d never wanted to get into that game to begin with: I was just playing about with an interesting bit of technology when suddenly half the world began talking about it.

The world taking cryptocurrencies seriously was the worst thing that ever happened to them. When they were just a toy, nobody “invested” in them. Nobody built planet-destroying mining rigs to compete to produce more of them. Nobody used them as a vehicle to make ransomware feasible or set up elaborate Ponzi schemes or get-rich-quick scams off the back of them.

(Fake) cryptolocker screenshot that implies that DanQ.me has encrypted your files and will only decrypt them if you send 1 EGX (Emma GoldCoin).
DanQ.me has encrypted all your files. As Emma GoldCoin is the only cryptocurrency I can get behind, I demand you send me 1 EGX to unlock them. (No, don’t go and check; I promise they’re encrypted! Just take my word on it!)

And yeah, with few exceptions (of which Emma GoldCoin is the best), cryptocurrencies not only provide a vehicle for scammers, do nothing to combat inequality (and potentially make it worse by tying it to the digital divide), and destroy the planet… but they generally don’t even achieve the promises they make of anonymous, decentralised, stable, utilitarian currencies.

I’m not going to deep-dive into everything that’s wrong with cryptocurrencies3 (and I’m not going near NFTs, but rest assured they’re even stupider). There’s plenty of more-eloquent people online who can explain it to you if you need to; start at Web3IsGoingGreat.com if you like.

So yeah, if we could just uninvent cryptocurrencies, or at least uninvent whatever it is the masses think they see in them, then that’d be just great, thanks.

Footnotes

1 Being legal tender and being useful to pay your taxes are the magic beans that make fiat currencies worth something.

2 Sometimes, people mistake me for somebody with any level of interest in cryptocurrency “investment”. After I’m done correcting their misapprehension, I enjoy pointing out that I made a 150,000% return-on-investment on cryptocurrencies and I still recommend against anybody getting involved in them.

3 If I can pick out just one pet hate, though, that trumps all the others: it’s the “cryptobros” who call cryptocurrencies “crypto”, as if that wasn’t a prefix that already had a plethora of better-established uses, all of which are undermined by the co-opting of their name. It’s somehow even worse than the idiots who shorten Wikipedia to “wiki”.

× ×

Length Extension Attack Demonstration (Video)

This post is also available as an article. So if you'd rather read a conventional blog post of this content, you can!

This is a video version of my blog post, Length Extension Attack. In it, I talk through the theory of length extension attacks and demonstrate an SHA-1 length extension attack against an (imaginary) website.

The video can also be found on:

Length Extension Attack Demonstration

This post is also available as a video. If you'd prefer to watch/listen to me talk about this topic, give it a look.

Prefer to watch/listen than read? There’s a vloggy/video version of this post in which I explain all the key concepts and demonstrate an SHA-1 length extension attack against an imaginary site.

I understood the concept of a length traversal attack and when/how I needed to mitigate them for a long time before I truly understood why they worked. It took until work provided me an opportunity to play with one in practice (plus reading Ron Bowes’ excellent article on the subject) before I really grokked it.

Would you like to learn? I’ve put together a practical demo that you can try for yourself!

Screenshot of vulnerable site with legitimate "download" link hovered.
For the demonstration, I’ve built a skeletal stock photography site whose download links are protected by a hash of the link parameters, salted using a secret string stored securely on the server. Maybe they let authorised people hotlink the images or something.

You can check out the code and run it using the instructions in the repository if you’d like to play along.

Using hashes as message signatures

The site “Images R Us” will let you download images you’ve purchased, but not ones you haven’t. Links to the images are protected by a SHA-1 hash1, generated as follows:

Diagram showing SHA1 being fed an unknown secret key and the URL params "download=free" and outputting a hash as a "download key".
The nature of hashing algorithms like SHA-1 mean that even a small modification to the inputs, e.g. changing one character in the word “free”, results in a completely different output hash which can be detected as invalid.

When a “download” link is generated for a legitimate user, the algorithm produces a hash which is appended to the link. When the download link is clicked, the same process is followed and the calculated hash compared to the provided hash. If they differ, the input must have been tampered with and the request is rejected.

Without knowing the secret key – stored only on the server – it’s not possible for an attacker to generate a valid hash for URL parameters of the attacker’s choice. Or is it?

Changing download=free to download=valuable invalidates the hash, and the request is denied.

Actually, it is possible for an attacker to manipulate the parameters. To understand how, you must first understand a little about how SHA-1 and its siblings actually work:

SHA-1‘s inner workings

  1. The message to be hashed (SECRET_KEY + URL_PARAMS) is cut into blocks of a fixed size.2
  2. The final block is padded to bring it up to the full size.3
  3. A series of operations are applied to the first block: the inputs to those operations are (a) the contents of the block itself, including any padding, and (b) an initialisation vector defined by the algorithm.4
  4. The same series of operations are applied to each subsequent block, but the inputs are (a) the contents of the block itself, as before, and (b) the output of the previous block. Each block is hashed, and the hash forms part of the input for the next.
  5. The output of running the operations on the final block is the output of the algorithm, i.e. the hash.
Diagram showing message cut into blocks, the last block padded, and then each block being fed into a function along with the output of the function for the previous block. The first function, not having a previous block, receives the IV as its secondary input. The final function outputs the hash.
SHA-1 operates on a single block at a time, but the output of processing each block acts as part of the input of the one that comes after it. Like a daisy chain, but with cryptography.

In SHA-1, blocks are 512 bits long and the padding is a 1, followed by as many 0s as is necessary, leaving 64 bits at the end in which to specify how many bits of the block were actually data.

Padding the final block

Looking at the final block in a given message, it’s apparent that there are two pieces of data that could produce exactly the same output for a given function:

  1. The original data, (which gets padded by the algorithm to make it 64 bytes), and
  2. A modified version of the data, which has be modified by padding it in advance with the same bytes the algorithm would; this must then be followed by an additional block
Illustration showing two blocks: one short and padded, one pre-padded with the same characters, receiving the same IV and producing the same output.
A “short” block with automatically-added padding produces the same output as a full-size block which has been pre-populated with the same data as the padding would add.5
In the case where we insert our own “fake” padding data, we can provide more message data after the padding and predict the overall hash. We can do this because we the output of the first block will be the same as the final, valid hash we already saw. That known value becomes one of the two inputs into the function for the block that follows it (the contents of that block will be the other input). Without knowing exactly what’s contained in the message – we don’t know the “secret key” used to salt it – we’re still able to add some padding to the end of the message, followed by any data we like, and generate a valid hash.

Therefore, if we can manipulate the input of the message, and we know the length of the message, we can append to it. Bear that in mind as we move on to the other half of what makes this attack possible.

Parameter overrides

“Images R Us” is implemented in PHP. In common with most server-side scripting languages, when PHP sees a HTTP query string full of key/value pairs, if a key is repeated then it overrides any earlier iterations of the same key.

Illustration showing variables in a query string: "?one=foo&two=bar&one=baz". When parsed by PHP, the second value of "one" ("baz") only is retained.
Many online sources say that this “last variable matters” behaviour is a fundamental part of HTTP, but it’s not: you can disprove is by examining $_SERVER['QUERY_STRING'] in PHP, where you’ll find the entire query string. You could even implement your own query string handler that instead makes the first instance of each key the canonical one, if you really wanted.6
It’d be tempting to simply override the download=free parameter in the query string at “Images R Us”, e.g. making it download=free&download=valuable! But we can’t: not without breaking the hash, which is calculated based on the entire query string (minus the &key=... bit).

But with our new knowledge about appending to the input for SHA-1 first a padding string, then an extra block containing our payload (the variable we want to override and its new value), and then calculating a hash for this new block using the known output of the old final block as the IV… we’ve got everything we need to put the attack together.

Putting it all together

We have a legitimate link with the query string download=free&key=ee1cce71179386ecd1f3784144c55bc5d763afcc. This tells us that somewhere on the server, this is what’s happening:

Generation of the legitimate hash for the (unknown) secret key a string download=free, with algorithmic padding shown.
I’ve drawn the secret key actual-size (and reflected this in the length at the bottom). In reality, you might not know this, and some trial-and-error might be necessary.7
If we pre-pad the string download=free with some special characters to replicate the padding that would otherwise be added to this final8 block, we can add a second block containing an overriding value of download, specifically &download=valuable. The first value of download=, which will be the word free followed by a stack of garbage padding characters, will be discarded.

And we can calculate the hash for this new block, and therefore the entire string, by using the known output from the previous block, like this:

The previous diagram, but with the padding character manually-added and a second block containing "&download=valuable". The hash is calculated using the known output from the first block as the IV to the function run over the new block, producing a new hash value.
The URL will, of course, be pretty hideous with all of those special characters – which will require percent-encoding – on the end of the word ‘free’.

Doing it for real

Of course, you’re not going to want to do all this by hand! But an understanding of why it works is important to being able to execute it properly. In the wild, exploitable implementations are rarely as tidy as this, and a solid comprehension of exactly what’s happening behind the scenes is far more-valuable than simply knowing which tool to run and what options to pass.

That said: you’ll want to find a tool you can run and know what options to pass to it! There are plenty of choices, but I’ve bundled one called hash_extender into my example, which will do the job pretty nicely:

$ docker exec hash_extender hash_extender \
    --format=sha1 \
    --data="download=free" \
    --secret=16 \
    --signature=ee1cce71179386ecd1f3784144c55bc5d763afcc \
    --append="&download=valuable" \
    --out-data-format=html
Type: sha1
Secret length: 16
New signature: 7b315dfdbebc98ebe696a5f62430070a1651631b
New string: download%3dfree%80%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%e8%26download%3dvaluable

I’m telling hash_extender:

  1. which algorithm to use (sha1), which can usually be derived from the hash length,
  2. the existing data (download=free), so it can determine the length,
  3. the length of the secret (16 bytes), which I’ve guessed but could brute-force,
  4. the existing, valid signature (ee1cce71179386ecd1f3784144c55bc5d763afcc),
  5. the data I’d like to append to the string (&download=valuable), and
  6. the format I’d like the output in: I find html the most-useful generally, but it’s got some encoding quirks that you need to be aware of!

hash_extender outputs the new signature, which we can put into the key=... parameter, and the new string that replaces download=free, including the necessary padding to push into the next block and your new payload that follows.

Unfortunately it does over-encode a little: it’s encoded all the& and = (as %26 and %3d respectively), which isn’t what we wanted, so you need to convert them back. But eventually you end up with the URL: http://localhost:8818/?download=free%80%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%e8&download=valuable&key=7b315dfdbebc98ebe696a5f62430070a1651631b.

Browser at the resulting URL, showing the "valuable" image (a pile of money).
Disclaimer: the image you get when you successfully exploit the test site might not actually be valuable.

And that’s how you can manipulate a hash-protected string without access to its salt (in some circumstances).

Mitigating the attack

The correct way to fix the problem is by using a HMAC in place of a simple hash signature. Instead of calling sha1( SECRET_KEY . urldecode( $params ) ), the code should call hash_hmac( 'sha1', urldecode( $params ), SECRET_KEY ). HMACs are theoretically-immune to length extension attacks, so long as the output of the hash function used is functionally-random9.

Ideally, it should also use hash_equals( $validDownloadKey, $_GET['key'] ) rather than ===, to mitigate the possibility of a timing attack. But that’s another story.

Footnotes

1 This attack isn’t SHA1-specific: it works just as well on many other popular hashing algorithms too.

2 SHA-1‘s blocks are 64 bytes long; other algorithms vary.

3 For SHA-1, the padding bits consist of a 1 followed by 0s, except the final 8-bytes are a big-endian number representing the length of the message.

4 SHA-1‘s IV is 67452301 EFCDAB89 98BADCFE 10325476 C3D2E1F0, which you’ll observe is little-endian counting from 0 to F, then back from F to 0, then alternating between counting from 3 to 0 and C to F. It’s considered good practice when developing a new cryptographic system to ensure that the hard-coded cryptographic primitives are simple, logical, independently-discoverable numbers like simple sequences and well-known mathematical constants. This helps to prove that the inventor isn’t “hiding” something in there, e.g. a mathematical weakness that depends on a specific primitive for which they alone (they hope!) have pre-calculated an exploit. If that sounds paranoid, it’s worth knowing that there’s plenty of evidence that various spy agencies have deliberately done this, at various points: consider the widespread exposure of the BULLRUN programme and its likely influence on Dual EC DRBG.

5 The padding characters I’ve used aren’t accurate, just representative. But there’s the right number of them!

6 You shouldn’t do this: you’ll cause yourself many headaches in the long run. But you could.

7 It’s also not always obvious which inputs are included in hash generation and how they’re manipulated: if you’re actually using this technique adversarily, be prepared to do a little experimentation.

8 In this example, the hash operates over a single block, but the exact same principle applies regardless of the number of blocks.

9 Imagining the implementation of a nontrivial hashing algorithm, the predictability of whose output makes their HMAC vulnerable to a length extension attack, is left as an exercise for the reader.

× × × × × × × ×

My Default Apps at the End of 2023

Kev Quirk, Colin Walker, and other cool kids I follow online made it sound fun to share your “lifestack” as we approach the end of 2023.

So here’s mine: my digital “everyday carry” list of the tools and services I routinely use:

  • 📨 Mail Service: Proton Mail
  • 📮 Mail Client: Thunderbird (Desktop), Proton Mail App (Android), Proton Mail webmail (anywhere else)
  • 📝 Notes: Obsidian, Syncthing (for cross-device sync)
  • To-Do: Obsidian, physical notepad [not happy with this; want something more productive]
  • 📆 Calendar: Google Calendar (via Thunderbird on Desktop) [not happy with this; want something not-Google – still waiting on Proton Calendar getting good!]
  • 🙍🏻‍♂️ Contacts: Proton Mail
  • 📖 RSS Service: FreshRSS, selfhosted
  • 🗞️ RSS Client: FreshRSS (Desktop), FeedMe (Android)
  • ⌨️ Launcher: RayCast (MacOS), PowerToys Run (Windows)
  • ☁️ Cloud storage: ownCloud (selfhosted)
  • 🌅 Photo library: plain old directories! [would like: something selfhosted, mostly filesystem-driven, with Web interface]
  • 🌐 Web Browser: Firefox (everywhere)
  • 💬 Chat: Slack, WhatsApp, Signal, Telegram
  • 🔖 Bookmarks: Firefox (easy access), Wallabag (selfhosted, for long-term archiving)
  • 📚 Reading: dead tree format [my Kindle v2 died and I’m seeking a non-Amazon replacement; suggestions welcome], Calibre
  • 📜 Word Processing: Microsoft Word, Google Docs
  • 📈 Spreadsheets: Microsoft Excel, Google Sheets
  • 📊 Presentations: reveal.js
  • 🛒 Shopping Lists: pen and paper
  • 💰 Personal Finance: Google Sheets
  • 🎵 Music: YouTube Music [not entirely happy with it; considering replacement]
  • 🎤 Podcasts: FreshRSS; experimenting with Pocket Casts
  • 🔐 Password Management: KeePassXC, Syncthing (for cross-device sync)
  • 🤦‍♂️ Social Media: Mastodon, selfhosted
  • 🔎 Search: DuckDuckGo
  • 🧮 Code Editor: Sublime Text
  • ⌨️ KVM: Barrier
  • 🗺️ Navigation: OpenStreetMap, Google Maps, Talkietoaster (Garmin Montana)
  • 📍 Location Tracking: uLogger
  • 🔗 Blog: WordPress, selfhosted

Gemini and Spartan without a browser

A particular joy of the Gemini and Spartan protocols – and the Markdown-like syntax of Gemtext – is their simplicity.

Screenshot showing this blog post as viewed over the Gemini protocol in the Lagrange browser
The best way to explore Geminispace is with a browser like Lagrange browser, of course.

Even without a browser, you can usually use everyday command-line tools that you might have installed already to access relatively human-readable content.

Here are a few different command-line options that should show you a copy of this blog post (made available via CapsulePress, of course):

Gemini

Gemini communicates over a TLS-encrypted channel (like HTTPS), so we need a to use a tool that speaks the language. Luckily: unless you’re on Windows you’ve probably got one installed already1.

Using OpenSSL

This command takes the full gemini:// URL you’re looking for and the domain name it’s at. 1965 refers to the port number on which Gemini typically runs –

printf "gemini://danq.me/posts/gemini-without-a-browser\r\n" | \
  openssl s_client -ign_eof -connect danq.me:1965

Using GnuTLS

GnuTLS closes the connection when STDIN closes, so we use cat to keep it open. Note inclusion of --no-ca-verification to allow self-signed certificates (optionally add --tofu for trust-on-first-use support, per the spec).

{ printf "gemini://danq.me/posts/gemini-without-a-browser\r\n"; cat -; } | \
  gnutls-cli --no-ca-verification danq.me:1965

Using Ncat

Netcat reimplementation Ncat makes Gemini requests easy:

printf "gemini://danq.me/posts/gemini-without-a-browser\r\n" | \
  ncat --ssl danq.me 1965

Spartan

Spartan is a little like “Gemini without TLS“, but it sports an even-more-lightweight request format which makes it especially easy to fudge requests2.

Using Telnet

Note the use of cat to keep the connection open long enough to get a response, as we did for Gemini over GnuTLS.

{ printf "danq.me /posts/gemini-without-a-browser 0\r\n"; cat -; } | \
  telnet danq.me 300

Using cURL

cURL supports the telnet protocol too, which means that it can be easily coerced into talking Spartan:

printf "danq.me /posts/gemini-without-a-browser 0\r\n" | \
  curl telnet://danq.me:300

Using Ncat/Netcat

Because TLS support isn’t needed, this also works perfectly well with Netcat – just substitute nc/netcat or whatever your platform calls it in place of ncat:

printf "danq.me /posts/gemini-without-a-browser 0\r\n" | \
  ncat danq.me 300

I hope these examples are useful to somebody debugging their capsule, someday.

Footnotes

1 You can still install one on Windows, of course, it’s just less-likely that your operating system came with such a command-line tool built-in

2 Note that the domain and path are separated in a Spartan request and followed by the size of the request payload body: zero in all of my examples

×

Incredible Doom

I just finished reading Incredible Doom volumes 1 and 2, by Matthew Bogart and Jesse Holden, and man… that was a heartwarming and nostalgic tale!

Softcover bound copies of volumes 1 and 2 of Incredible Doom, on a wooden surface.
Conveniently just-over-A5 sized, each of the two volumes is light enough to read in bed without uncomfortably clonking yourself in the face.

Set in the early-to-mid-1990s world in which the BBS is still alive and kicking, and the Internet’s gaining traction but still lacks the “killer app” that will someday be the Web (which is still new and not widely-available), the story follows a handful of teenagers trying to find their place in the world. Meeting one another in the 90s explosion of cyberspace, they find online communities that provide connections that they’re unable to make out in meatspace.

A "Geek Code Block", printed in a dot-matrix style font, light-blue on black, reads: GU D-- -P+ C+L? U E M+ S-/+ N--- H-- F--(+) !G W++ T R? X?
I loved some of the contemporary nerdy references, like the fact that each chapter page sports the “Geek Code” of the character upon which that chapter focusses.1
So yeah: the whole thing feels like a trip back into the naivety of the online world of the last millenium, where small, disparate (and often local) communities flourished and early netiquette found its feet. Reading Incredible Doom provides the same kind of nostalgia as, say, an afternoon spent on textfiles.com. But it’s got more than that, too.
Partial scan from a page of Incredible Doom, showing a character typing about "needing a solution", with fragments of an IRC chat room visible in background panels.
The user interfaces of IRC, Pine, ASCII-art-laden BBS menus etc. are all produced with a good eye for accuracy, but don’t be fooled: this is a story about humans, not computers. My 9-year-old loved it too, and she’s never even heard of IRC (I hope!).

It touches on experiences of 90s cyberspace that, for many of us, were very definitely real. And while my online “scene” at around the time that the story is set might have been different from that of the protagonists, there’s enough of an overlap that it felt startlingly real and believable. The online world in which I – like the characters in the story – hung out… but which occupied a strange limbo-space: both anonymous and separate from the real world but also interpersonal and authentic; a frontier in which we were still working out the rules but within which we still found common bonds and ideals.

A humorous comic scene from Incredible Doom in which a male character wearing glasses walks with a female character he's recently met and is somewhat intimidated by, playing-out in his mind the possibility that she might be about to stab him. Or kiss him. Or kiss him THEN stab him.
Having had times in the 90s that I met up offline with relative strangers whom I first met online, I can confirm that… yeah, the fear is real!

Anyway, this is all a long-winded way of saying that Incredible Doom is a lot of fun and if it sounds like your cup of tea, you should read it.

Also: shortly after putting the second volume down, I ended up updating my Geek Code for the first time in… ooh, well over a decade. The standards have moved on a little (not entirely in a good way, I feel; also they’ve diverged somewhat), but here’s my attempt:

----- BEGIN GEEK CODE VERSION 6.0 -----
GCS^$/SS^/FS^>AT A++ B+:+:_:+:_ C-(--) D:+ CM+++ MW+++>++
ULD++ MC+ LRu+>++/js+/php+/sql+/bash/go/j/P/py-/!vb PGP++
G:Dan-Q E H+ PS++ PE++ TBG/FF+/RM+ RPG++ BK+>++ K!D/X+ R@ he/him!
----- END GEEK CODE VERSION 6.0 -----

Footnotes

1 I was amazed to discover that I could still remember most of my Geek Code syntax and only had to look up a few components to refresh my memory.

× × × ×

Reading Rolled Papyri

One of my favourite parts of my former role at the Bodleian Libraries was getting to work on exhibitions. Not just because it was varied and interesting work, but because it let me get up-close to remarkable artifacts that most people never even get the chance to see.

Miniature model of an exhibition space, constructed using painted blocks and laid-out on the floor of an exhibition space.
We also got to play dollhouse, laying out exhibitions in miniature.

A personal favourite of mine are the Herculaneum Papyri. These charred scrolls were part of a private library near Pompeii that was buried by the eruption of Mount Vesuvius in 79 CE. Rediscovered from 1752, these ~1,800 scrolls were distributed to academic institutions around the world, with the majority residing in Naples’ Biblioteca Nazionale Vittorio Emanuele III.

Under-construction exhibition including a highly-reflective suit worn by volcano field researchers.
The second time I was in an exhibition room with the Bodleian’s rolled-up Herculaneum Papyri was for an exhibition specifically about humanity’s relationship with volcanoes.

As you might expect of ancient scrolls that got buried, baked, and then left to rot, they’re pretty fragile. That didn’t stop Victorian era researchers trying a variety of techniques to gently unroll them and read what was inside.

Blackened fragments of an unrolled papyrus.
Unrolling the scrolls tends to go about as well as you’d anticipate. A few have been deciphered this way. Many others have been damaged or destroyed by unrolling efforts.

Like many others, what I love about the Herculaneum Papyri is the air of mystery. Each could be anything from a lost religious text to, I don’t know, somebody’s to-do list (“buy milk, arrange for annual service of chariot, don’t forget to renew volcano insurance…”).1

In recent years, we’ve tried “virtually unrolling” the scrolls using a variety of related technologies. And – slowly – we’re getting there.

X-ray tomography is amazing, but it’s hampered by the fact that the ink and paper have near-equivalent transparency to x-rays. Plus, all the other problems. But new techniques are helping to overcome them.

So imagine my delight when this week, for the first time ever, a complete word was extracted from one of the carbonised, still-rolled-up scrolls from Herculaneum. Something that would have seemed inconceivable to the historians who first discovered and catalogued the scrolls is now possible, thanks to their careful conservation over the years along with the steady advance of technology.

Computer-assisted photograph showing visible letters on a rolled scroll, with highlighting showing those that can be deciphered, forming a word.
The word appears to be “purple”: either πορφύ̣ρ̣ας̣ (a noun, similar to how we might say “pass the purple [pen]” or πορφυ̣ρ̣ᾶς̣: if we can decode more words around it then it which might become clear from the context.
Anyway, I thought that was exciting news so I wanted to share.

Footnotes

1 For more-serious academic speculation about the potential value of the scrolls, Richard Carrier’s got you covered.

× × × ×

Easy FoundryVTT Cloud Hosting

Foundry is a wonderful virtual tabletop tool well-suited to playing tabletop roleplaying games with your friends, no matter how far away they are. It compares very favourably to the market leader Roll20, once you get past some of the initial set-up challenges and a moderate learning curve.

Screenshot from FoundryVTT, showing a party of three adventurers crossing a rickety bridge through a blood-red swamp, facing off against a handful of fiends coming the other way. A popup item card for a "Dragon Slayer Longsword" is visible, along with two 20-sided dice.
The party of adventurers I’ve been DMing for since last summer use Foundry to simulate a tabletop (alongside a conventional video chat tool to let us see and hear one another).

You can run it on your own computer and let your friends “connect in” to it, so long as you’re able to reconfigure your router a little, but you’ll be limited by the speed of your home Internet connection and people won’t be able to drop in and e.g. tweak their character sheet except when you’ve specifically got the application running.

A generally better option is to host your Foundry server in the cloud. For most of its history, I’ve run mine on Fox, my NAS, but I’ve recently set one up on a more-conventional cloud virtual machine too. A couple of friends have asked me about how to set up their own, so here’s a quick guide:

Screenshot from Linode showing a server, "Foundry", running, with specs as described below.
I used Linode to spin up a server because I still had a stack of free credits following a recent project. The instructions will work on any cloud host where you can spin up a Debian 12 virtual machine, and can be adapted for other distributions of Linux.

You will need…

  • A Foundry license ($50 USD / £48 GBP, one-off payment1)
  • A domain name for which you control the DNS records; you’ll need to point a domain, like “danq.me” (or a subdomain of it, e.g. “vtt.danq.me”), at an IP address you’ll get later by creating an “A” record: your domain name registrar can probably help with this – I mostly use Gandi and, ignoring my frustration with recent changes to their email services, I think they’re great
  • An account with a cloud hosting provider: this example uses Linode but you can adapt for any of them
  • A basic level of comfort with the command-line

1. Spin up a server

Getting a virtual server is really easy nowadays.

Annotated screenshot showing a Linode provisioning form, with "Debian 12", the "London, UK", region, and "Dedicated 4GB" plan options selected.
Click, click, click, and you’ve got yourself a server.

You’ll need:

  • The operating system to be Debian 12 (or else you’ll need to adapt the instructions below)
  • The location to be somewhere convenient for your players: pick a server location that’s relatively-local to the majority of them to optimise for connection speeds
  • Approximately 2 CPUs and 4GB of RAM, per Foundry’s recommended server specifications
  • An absolute minimum of 1GB of storage space, I’d recommend plenty more: The Levellers’ campaign currently uses about 10GB for all of its various maps, art, videos, and game data, so give yourself some breathing room (space is pretty cheap) – I’ve gone with 80GB for this example, because that’s what comes as standard with the 2 CPU/4GB RAM server that Linode offer

Choose a root password when you set up your server. If you’re a confident SSH user, add your public key so you can log in easily (and then disable password authentication entirely!).

For laziness, this guide has you run Foundry as root on your new server. Ensure you understand the implications of this.2

2. Point your (sub)domain at it

DNS propogation can be pretty fast, but… sometimes it isn’t. So get this step underway before you need it.

Your newly-created server will have an IP address, and you’ll be told what it is. Put that IP address into an A-record for your domain.

Screenshot from Gandi showing adding an A record for vtt.danq.me -> 1.2.3.4.
The interface for adding a new DNS record in Gandi is pretty simple – record type, time to live, name, address – but it’s rarely more complicated that this with any registrar that provides DNS services.

3. Configure your server

In my examples, my domain name is vtt.danq.me and my server is at 1.2.3.4. Yours will be different!

Connect to your new server using SSH. Your host might even provide a web interface if you don’t have an SSH client installed: e.g. Linode’s “Launch LISH Console” button will do pretty-much exactly that for you. Log in as root using the password you chose when you set up the server (or your SSH private key, if that’s your preference). Then, run each of the commands below in order (the full script is available as a single file if you prefer).

3.1. Install prerequisites

You’ll need unzip (to decompress Foundry), nodejs (to run Foundry), ufw (a firewall, to prevent unexpected surprises), nginx (a webserver, to act as a reverse proxy to Foundry), certbot (to provide a free SSL certificate for Nginx), nvm (to install pm2) and pm2 (to keep Foundry running in the background). You can install them all like this:

apt update
apt upgrade
apt install -y unzip nodejs ufw nginx certbot nvm
npm install -g pm2

3.2. Enable firewall

By default, Foundry runs on port 30000. If we don’t configure it carefully, it can be accessed directly, which isn’t what we intend: we want connections to go through the webserver (over https, with http redirecting to https). So we configure our firewall to allow only these ports to be accessed. You’ll also want ssh enabled so we can remotely connect into the server, unless you’re exclusively using an emergency console like LISH for this purpose:

ufw allow ssh
ufw allow http
ufw allow https
ufw enable

3.3. Specify domain name

Putting the domain name we’re using into a variable for the remainder of the instructions saves us from typing it out again and again. Make sure you type your domain name (that you pointed to your server in step 2), not mine (vtt.danq.me):

DOMAIN=vtt.danq.me

3.4. Get an SSL certificate with automatic renewal

So long as the DNS change you made has propogated, this should Just Work. If it doesn’t, you might need to wait for a bit then try again.

certbot certonly --agree-tos --register-unsafely-without-email --rsa-key-size 4096 --webroot -w /var/www/html -d $DOMAIN

Don’t continue past this point until you’ve succeeded in getting the SSL certificate sorted.

The certificate will renew itself automatically, but you also need Nginx to restart itself whenever that happens. You can set that up like this:

printf "#!/bin/bash\nservice nginx restart\n" > /etc/letsencrypt/renewal-hooks/post/restart-nginx.sh
chmod +x /etc/letsencrypt/renewal-hooks/post/restart-nginx.sh

3.5. Configure Nginx to act as a reverse proxy for Foundry

You can, of course, manually write the Nginx configuration file: just remove the > /etc/nginx/sites-available/foundry from the end of the printf line to see the configuration it would write and then use/adapt to your satisfaction.

set +H
printf "server {\n listen 80;\n listen [::]:80;\n server_name $DOMAIN;\n\n # Redirect everything except /.well-known/* (used for ACME) to HTTPS\n root /var/www/html/;\n if (\$request_uri !~ \"^/.well-known/\") {\n return 301 https://\$host\$request_uri;\n }\n}\n\nserver {\n listen 443 ssl http2;\n listen [::]:443 ssl http2;\n server_name $DOMAIN;\n\n ssl_certificate /etc/letsencrypt/live/$DOMAIN/fullchain.pem;\n ssl_certificate_key /etc/letsencrypt/live/$DOMAIN/privkey.pem;\n\n client_max_body_size 300M;\n\n location / {\n # Set proxy headers\n proxy_set_header Host \$host;\n proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;\n proxy_set_header X-Forwarded-Proto \$scheme;\n\n # These are important to support WebSockets\n proxy_set_header Upgrade \$http_upgrade;\n proxy_set_header Connection \"Upgrade\";\n\n proxy_pass http://127.0.0.1:30000/;\n }\n}\n" > /etc/nginx/sites-available/foundry
ln -sf /etc/nginx/sites-available/foundry /etc/nginx/sites-enabled/foundry
service nginx restart

3.6. Install Foundry

3.6.1. Create a place for Foundry to live

mkdir {vtt,data}
cd vtt

3.6.2. Download and decompress it

Screenshot from FoundryVTT showing where to find the "Timed URL" download link button.
For this step, you’ll need to get a Timed URL from the Purchased Licenses page on your FoundryVTT account.

Substitute in your Timed URL in place of <url from website> (keep the quotation marks – " – though!):

wget -O foundryvtt.zip "<url from website>"
unzip foundryvtt.zip
rm foundryvtt.zip

3.6.3. Configure PM2 to run Foundry and keep it running

Now you’re finally ready to launch Foundry! We’ll use PM2 to get it to run automatically in the background and keep running:

pm2 start --name "Foundry" node -- resources/app/main.js --dataPath=/root/data

You can watch the logs for Foundry with PM2, too. It’s a good idea to take a quick peep at them to check it launched okay (press CTRL-C to exit):

pm2 logs 0

4. Start adventuring!

Screenshot showing FoundryVTT requesting a license key.
Point your web browser at your domain name (e.g. I might go to https://vtt.danq.me) and you should see Foundry’s first-load page, asking for your license key.

Provide your license key to get started, and then immediately change the default password: a new instance of Foundry has a blank default password, which means that anybody on Earth can administer your server: get that changed to something secure!

Now you’re running on Foundry!

Footnotes

1 Which currency you pay in, and therefore how much you pay, for a Foundry license depends on where in the world you are where your VPN endpoint says you are. You might like to plan accordingly.

2 Running Foundry as root is dangerous, and you should consider the risks for yourself. Adding a new user is relatively simple, but for a throwaway server used for a single game session and then destroyed, I wouldn’t bother. Specifically, the risk is that a vulnerability in Foundry, if exploited, could allow an attacker to reconfigure any part of your new server, e.g. to host content of their choice or to relay spam emails. Running as a non-root user means that an attacker who finds such a vulnerability can only trash your Foundry instance.

× × × × × ×

CapsulePress – Gemini / Spartan / Gopher to WordPress bridge

For a while now, this site has been partially mirrored via the Gemini1 and Gopher protocols.2 Earlier this year I presented hacky versions of the tools I’d used to acieve this (and made people feel nostalgic).

Now I’ve added support for Spartan3 too and, seeing as the implementations shared functionality, I’ve combined all three – Gemini, Spartan, and Gopher – into a single package: CapsulePress.

Diagram illustrating the behaviour of CapsulePress: a WordPress installation provides content, and CapsulePress makes that content available via gemini://, spartan://, and gopher:// URLs.

CapsulePress is a Gemini/Spartan/Gopher to WordPress bridge. It lets you use WordPress as a CMS for any or all of those three non-Web protocols in addition to the Web.

For example, that means that this post is available on all of:

Composite screenshot showing this blog post in, from top-left to bottom-right: (1) Firefox, via HTTPS, (2) Lagrange, via Gemini, (3) Lagrange, via Spartan, and (4) Lynx, via Gopher.

It’s also possible to write posts that selectively appear via different media: if I want to put something exclusively on my gemlog, I can, by assigning metadata that tells WordPress to suppress a post but still expose it to CapsulePress. Neat!

90s-style web banners in the style of Netscape ads, saying "Gemini now!", "Spartan now!", and "Gopher now!". Beneath then a wider banner ad promotes CapsulePress v0.1.
Using Gemini and friends in the 2020s make me feel like the dream of the Internet of the nineties and early-naughties is still alive. But with fewer banner ads.

I’ve open-sourced the whole thing under a super-permissive license, so if you want your own WordPress blog to “feed” your Gemlog… now you can. With a few caveats:

  • It’s hard to use. While not as hacky as the disparate piles of code it replaced, it’s still not the cleanest. To modify it you’ll need a basic comprehension of all three protocols, plus Ruby, SQL, and sysadmin skills.
  • It’s super opinionated. It’s very much geared towards my use case. It’s improved by the use of templates. but it’s still probably only suitable for this site for the time being, until you make changes.
  • It’s very-much unfinished. I’ve got a growing to-do list, which should be a good clue that it’s Not Finished. Maybe it never will but. But there’ll be changes yet to come.

Whether or not your WordPress blog makes the jump to Geminispace4, I hope you’ll came take a look at mine at one of the URLs linked above, and then continue to explore.

If you’re nostalgic for the interpersonal Internet – or just the idea of it, if you’re too young to remember it… you’ll find it there. (That Internet never actually went away, but it’s harder to find on today’s big Web than it is on lighter protocols.)

Footnotes

1 Also available via Gemini.

2 Also via Finger (but that’s another story).

3 Also available via Spartan.

4 Need a browser? I suggest Lagrange.

× × ×

AI is lying to you

I’ve resisted writing about the current trends in AI because, well, others are already doing it better.1 But I was inspired by Garrett‘s observation that – according to the Washington Post – the C4 dataset has tokenised his personal website.

Screenshot from the Washington Post, showing a search for 'danq.me' returning 300k tokens (0.0002% of the dataset). The site is ranked 61,724.
My website is in the dataset too, but with a massive 300,000 tokens. Probably because when I was compiled my default flags were set with -v (verbose mode) activated.

Much has been said about how ChatGPT and her friends will hallucinate and mislead. Let’s take an example.

Remember that ChatGPT has almost-certainly read basically everything I’ve ever written online – it might well be better-informed about me better than you are – as you read this:

Dan asks: "Tell me about the blogger who runs DanQ.me" ChatGPT responds: "Dan Q is the blogger and web developer who runs the website "DanQ.me". He is a British computer scientist, game designer, and open-source software developer. DanQ.me is a personal website where Dan writes about a variety of topics, including technology, programming, and gaming. He is also known for creating the web-based puzzle game "Notpron," which has gained a large following online. In addition to his work on "DanQ.me" and "Notpron," Dan has contributed to several open-source software projects, including the Linux operating system and the Mozilla web browser. He has also worked as a software developer for several companies, including Google and Canonical. Dan is active on social media, particularly Twitter, where he frequently shares his thoughts on technology and other topics." The screenshot is marked up with the bits about gaming and being active on social media marked "stretching/exaggerating", the bits about Notpron, Linux, Mozilla, and Canonical being "believable lies", and the bits about Twitter and Google identified as "complete bollocks"
Given that ChatGPT has all the information it needs to talk about me accurately, it comes up with a surprising amount of crap.

When I asked ChatGPT about me, it came up with a mixture of truths and believable lies2, along with a smattering of complete bollocks.

In another example, ChatGPT hallucinates this extra detail specifically because the conversation was foreshadowed by its previous mistake. At this point, it digs its heels in and commits to its claim, like the stubborn guy in the corner of the pub who doubles-down on his bullshit.

If you were to ask at the outset who wrote Notpron, ChatGPT would have gotten it right, but because it already mis-spoke, it’s now trapped itself in a lie, incapable of reconsidering what it said previously as having been anything but the truth:

Asked to elaborate on "Notpron", ChatGPT says - Dan Q's custom "Notpron" challenge can be found at the following URL: https://www.danq.me/notpron/. This challenge features unique levels and puzzles that were created by Dan Q himself, and it is known for being particularly difficult and challenging. It has gained a large following among "Notpron" enthusiasts and puzzle game fans. If you decide to take on the challenge, be prepared to think outside the box and use your problem-solving skills to their fullest!"
Notpron is great and all, but it was written by David Münnich, not me. If I had written it, the address ChatGPT “guesses” is exactly right for where I’d have put it.

Simon Willison says that we should call this behaviour “lying”. In response to this, several people told him that the “lying” excessively anthropomorphises these chatbots, implying that they’re deliberately attempting to mislead their users. Simon retorts:

I completely agree that anthropomorphism is bad: these models are fancy matrix arithmetic, not entities with intent and opinions.

But in this case, I think the visceral clarity of being able to say “ChatGPT will lie to you” is a worthwhile trade.

I agree with Simon. ChatGPT and systems like it are putting accessible AI into the hands of the masses, and that means that the people who are using it don’t necessarily understand – nor desire to learn – the statistical mechanisms that actually underpin the AI‘s “decisions” about how to respond.

Trying to explain how and why their new toy will get things horribly wrong is hard, and it takes a critical eye, time, and practice to begin to discover how to use these tools effectively and safely.3 It’s simpler just to say “Here’s a tool; by the way, it’s a really convincing liar and you can’t trust it even a little.”

Giving people tools that will lie to them. What an interesting time to be alive!

Footnotes

1 I’m tempted to blog about my experience of using Stable Diffusion and GPT-3 as assistants while DMing my regular Dungeons & Dragons game, but haven’t worked out exactly what I’m saying yet.

2 That ChatGPT lies won’t be a surprise to anybody who’s used the system nor anybody who understands the fundamentals of how it works, but as AIs get integrated into more and more things, we’re going to need to teach a level of technical literacy about what that means, just like we do should about, say, Wikipedia.

3 For many of the tasks people talk about outsourcing to LLMs, it’s the case that it would take less effort for a human to learn how to do the task that it would for them to learn how to supervise an AI performing the task! That’s not to say they’re useless: just that (for now at least) you should only trust them to do something that you could do yourself and you’re therefore able to critically assess how well the machine did it.

× × ×

Subscribing to Forward using FreshRSS’s XPath Scraping

As I’ve mentioned before, I’m a fan of Tailsteak‘s Forward comic. I’m not a fan of the author’s weird aversion to RSS, so I hacked a way around it first using an exploit in webcomic reader app Comic Chameleon (accidentally getting access to comics weeks in advance of their publication as a side-effect) and later by using my own tool RSSey.

But now I’m able to use my favourite feed reader FreshRSS to scrape websites directly – like I’ve done for The Far Side – I should switch to using this approach to subscribe to Forward, too:

Screenshot showing RSS feed items: recent Forward episodes including their numbers, titles, and publication dates.
The goal: date-ordered, numbered, titled episodes of Forward in my feed reader.

Here’s the settings I came up with –

  • Feed URL: http://forwardcomic.com/list.php
  • Type of feed source: HTML + XPath (Web scraping)
  • XPath for finding news items: //a[starts-with(@href,'archive.php')]
  • Item title: .
  • Item link (URL): ./@href
  • Item date: ./following-sibling::text()[1]
  • Custom date/time format: - Y.m.d
Annotated screenshot showing how each XPath directive maps to each part of the page. The item selector finds each hyperlink that begins with "archive.php" (notably missing the most-recent comic at any given time, which is found at index.php), and the date is found in the text node that immediately follows it, in a slightly-unusual variation on ISO8601.
The comic pages themselves do a great thing for accessibility by including a complete transcript of each. But the listing page, which is basically a series of <a>s separated by <br>s rather than a <ul> and <li>s, for example, leaves something to be desired (and makes it harder to scrape, too!).

I continue to love this “killer feature” of FreshRSS, but I’m beginning to see how it could go further – I wish I had the free time to contribute to its development!

I’d love to see a mechanism for exporting/importing feed configurations like this so that I could share them more-easily, for example. I’d also be delighted if I could expand on my XPath rules to load pages referenced by the results and get data from them, too, e.g. so I could use an image found by XPath on the “item link” page as the thumbnail image! These are things RSSey could do for me, but FreshRSS can’t… yet!

× ×

Bypassing Region Restrictions on radio.garden

I must be the last person on Earth to have heard about radio.garden (thanks Pepsilora!), a website that uses a “globe” interface to let you tune in to radio stations around the globe. But I’d only used it for a couple of minutes before I discovered that there are region restrictions in place. Here in the UK, and perhaps elsewhere, you can’t listen to stations in other countries without using a VPN or similar tool… which might introduce a different region’s restrictions!

How to bypass radio.garden region restrictions

So I threw together a quick workaround:

  1. Ensure you’ve got a userscript manager installed (I like Violentmonkey, but there are other choices).
  2. Install this userscript; it’s hacky – I threw it together in under half an hour – but it seems to work!
Screenshot showing radio.garden tuned into YouFM in Mons, Belgium. An additional player control interface appears below the original one.
My approach is super lazy and simply injects a second audio player – which ignores region restrictions – below the original.

How does this work and how did I develop it?

For those looking to get into userscripting, here’s a quick tutorial on what I did to develop this bypass.

First, I played around with radio.garden for a bit to get a feel for what it was doing. I guessed that it must be tuning into a streaming URL when you select a radio station, so I opened by browser’s debugger on the Network tab and looked at what happened when I clicked on a “working” radio station, and how that differed when I clicked on a “blocked” one:

Screenshot from Firefox's Network debugger, showing four requests to a "working" radio station (of which two are media feeds) and two to a "blocked" radio station.

When connecting to a station, a request is made for some JSON that contains station metadata. Then, for a working station, a request is made for an address like /api/ara/content/listen/[ID]/channel.mp3. For a blocked station, this request isn’t made.

I figured that the first thing I’d try would be to get the [ID] of a station that I’m not permitted to listen to and manually try the URL to see if it was actually blocked, or merely not-being-loaded. Looking at a working station, I first found the ID in the JSON response and I was about to extract it when I noticed that it also appeared in the request for the JSON: that’s pretty convenient!

 

Composite screenshot from Firefox's Network debugger showing a request for station metadata being serviced, followed by a request for the MP3 stream with the same ID.My hypothesis was that the “blocking” is entirely implemented in the front-end: that the JavaScript code that makes the pretty bits work is looking at the “country” data that’s returned and using that to decide whether or not to load the audio stream. That provides many different ways to bypass it, from manipulating the JavaScript to remove that functionality, to altering the JSON response so that every station appears to be in the user’s country, to writing some extra code that intercepts the request for the metadata and injects an extra audio player that doesn’t comply with the regional restrictions.

But first I needed to be sure that there wasn’t some actual e.g. IP-based blocking on the streams. To do this, first I took the /api/ara/content/listen/[ID]/channel.mp3 address of a known-working station and opened it in VLC using Media > Open Network Stream…. That worked. Then I did the same thing again, but substituted the [ID] part of the address with the ID of a “blocked” station. VLC happily started spouting French to me: the bypass would, in theory, work!

Next, I needed to get that to work from within the site itself. It’s implemented in React, which is a pig to inject code into because it uses horrible identifiers for DOM elements. But of course I knew that there’d be this tell-tale fetch request for the station metadata that I could tap into, so I used this technique to override the native fetch method and replace it with my own “wrapper” that logged the stream address for any radio station I clicked on. I tested the addresses this produced using my browser.

window.fetch = new Proxy(window.fetch, {
  apply: (target, that, args)=>{
    const tmp = target.apply(that, args);
    tmp.then(res=>{
      const matches = res.url.match(/\/api\/ara\/content\/channel\/(.*)/);
      if(matches){
        const stationId = matches[1];
        console.log(`http://radio.garden/api/ara/content/listen/${stationId}/channel.mp3`);
      }
    });
    return tmp;
  },
});

That all worked nicely, so all I needed to do now was to use those addresses rather than simply logging them. Rather that get into the weeds reverse-engineering the built-in player, I simply injected a new <audio> element after it and pointed it at the correct address, and applied a couple of CSS tweaks to make it fit in nicely.

The only problem was that on UK-based radio stations I’d now hear a slight echo, because the original player was still working. I could’ve come up with an elegant solution to this, I’m sure, but I went for a quick-and-dirty hack: I used res.json() to obtain the body of the metadata response… which meant that the actual code that requested it would no longer be able to get it (you can only decode the body of a fetch response once!). radio.garden’s own player treats this as an error and doesn’t play that radio station, but my new <audio> element still plays it perfectly well.

It’s not pretty, but it’s functional. You can read the finished source code on Github. I don’t anticipate that I’ll be maintaining this script so if it stops working you’ll have to fix it yourself, and I have no intention of “finishing” it by making it nicer or prettier. I just wanted to share in case you can learn anything from my approach.

× × ×

The tape library robot that served drinks

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

This is an IBM tape library robot. It’s designed to fetch, load, unload, and return tape media cartridges to the correct bay in large enterprise environments.

One fateful ‘workend’, I made one serve drinks.

It went back into prod on the Monday…

In a story reminiscient of those anecdotes about early computer science students competing to “race” hard drives across the lab by writing programs that moved the heads in a way that vibrated/walked the devices, @SecurityWriter shares a wonderful story about repurposing a backup tape management robot to act as a server (pun intended) of drinks.

×

Yesterday’s Internet Today! (Woo DM 2023)

The week before last I had the opportunity to deliver a “flash talk” of up to 4 minutes duration at a work meetup in Vienna, Austria. I opted to present a summary of what I’ve learned while adding support for Finger and Gopher protocols to the WordPress installation that powers DanQ.me (I also hinted at the fact that I already added Gemini and Spring ’83 support, and I’m looking at other protocols). If you’d like to see how it went, you can watch my flash talk here or on YouTube.

If you love the idea of working from wherever-you-are but ocassionally meeting your colleagues in person for fabulous in-person events with (now optional) flash talks like this, you might like to look at Automattic’s recruitment pages

The presentation is a shortened, Automattic-centric version of a talk I’ll be delivering tomorrow at Oxford Geek Nights #53; so if you’d like to see it in-person and talk protocols with me over a beer, you should come along! There’ll probably be blog posts to follow with a more-detailed look at the how-and-why of using WordPress as a CMS not only for the Web but for a variety of zany, clever, retro, and retro-inspired protocols down the line, so perhaps consider the video above a “teaser”, I guess?