This is a video version of my blog post, Length Extension Attack. In it, I talk through the theory of length extension attacks and demonstrate an SHA-1 length extension attack against an (imaginary) website.
The video can also be found on:
This post is also available as an article. So if you'd rather read a conventional blog post of this content, you can!
This is a video version of my blog post, Length Extension Attack. In it, I talk through the theory of length extension attacks and demonstrate an SHA-1 length extension attack against an (imaginary) website.
The video can also be found on:
This post is also available as a video. If you'd prefer to watch/listen to me talk about this topic, give it a look.
Prefer to watch/listen than read? There’s a vloggy/video version of this post in which I explain all the key concepts and demonstrate an SHA-1 length extension attack against an imaginary site.
I understood the concept of a length traversal attack and when/how I needed to mitigate them for a long time before I truly understood why they worked. It took until work provided me an opportunity to play with one in practice (plus reading Ron Bowes’ excellent article on the subject) before I really grokked it.
Would you like to learn? I’ve put together a practical demo that you can try for yourself!
You can check out the code and run it using the instructions in the repository if you’d like to play along.
The site “Images R Us” will let you download images you’ve purchased, but not ones you haven’t. Links to the images are protected by a SHA-1 hash1, generated as follows:
When a “download” link is generated for a legitimate user, the algorithm produces a hash which is appended to the link. When the download link is clicked, the same process is followed and the calculated hash compared to the provided hash. If they differ, the input must have been tampered with and the request is rejected.
Without knowing the secret key – stored only on the server – it’s not possible for an attacker to generate a valid hash for URL parameters of the attacker’s choice. Or is it?
download=free
to download=valuable
invalidates the hash, and the request is denied.
Actually, it is possible for an attacker to manipulate the parameters. To understand how, you must first understand a little about how SHA-1 and its siblings actually work:
SECRET_KEY
+ URL_PARAMS
) is cut into blocks of a fixed size.2
In SHA-1, blocks are 512 bits long and the padding is a 1
, followed by as many 0
s as is necessary,
leaving 64 bits at the end in which to specify how many bits of the block were actually data.
Looking at the final block in a given message, it’s apparent that there are two pieces of data that could produce exactly the same output for a given function:
Therefore, if we can manipulate the input of the message, and we know the length of the message, we can append to it. Bear that in mind as we move on to the other half of what makes this attack possible.
“Images R Us” is implemented in PHP. In common with most server-side scripting languages, when PHP sees a HTTP query string full of key/value pairs, if a key is repeated then it overrides any earlier iterations of the same key.
$_SERVER['QUERY_STRING']
in PHP, where you’ll find the entire query string.
You could even implement your own query string handler that instead makes the first instance of each key the canonical one, if you really wanted.6
download=free
parameter in the query string at “Images R Us”, e.g. making it
download=free&download=valuable
! But we can’t: not without breaking the hash, which is calculated based on the entire query string (minus the &key=...
bit).
But with our new knowledge about appending to the input for SHA-1 first a padding string, then an extra block containing our payload (the variable we want to override and its new value), and then calculating a hash for this new block using the known output of the old final block as the IV… we’ve got everything we need to put the attack together.
We have a legitimate link with the query string download=free&key=ee1cce71179386ecd1f3784144c55bc5d763afcc
. This tells us that somewhere on the server, this is
what’s happening:
download=free
with some special characters to replicate the padding that would otherwise be added to this final8 block, we can add a second block containing
an overriding value of download, specifically &download=valuable
. The first value of download=
, which will be the word free
followed by
a stack of garbage padding characters, will be discarded.
And we can calculate the hash for this new block, and therefore the entire string, by using the known output from the previous block, like this:
Of course, you’re not going to want to do all this by hand! But an understanding of why it works is important to being able to execute it properly. In the wild, exploitable implementations are rarely as tidy as this, and a solid comprehension of exactly what’s happening behind the scenes is far more-valuable than simply knowing which tool to run and what options to pass.
That said: you’ll want to find a tool you can run and know what options to pass to it! There are plenty of choices, but I’ve bundled one called hash_extender
into my example, which will do the job pretty nicely:
$ docker exec hash_extender hash_extender \ --format=sha1 \ --data="download=free" \ --secret=16 \ --signature=ee1cce71179386ecd1f3784144c55bc5d763afcc \ --append="&download=valuable" \ --out-data-format=html Type: sha1 Secret length: 16 New signature: 7b315dfdbebc98ebe696a5f62430070a1651631b New string: download%3dfree%80%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%e8%26download%3dvaluable
I’m telling hash_extender
:
sha1
), which can usually be derived from the hash length,
download=free
), so it can determine the length,
16
bytes), which I’ve guessed but could brute-force,
ee1cce71179386ecd1f3784144c55bc5d763afcc
),
&download=valuable
), and
html
the most-useful generally, but it’s got some encoding quirks that you need to be aware of!
hash_extender
outputs the new signature, which we can put into the key=...
parameter, and the new string that replaces download=free
, including
the necessary padding to push into the next block and your new payload that follows.
Unfortunately it does over-encode a little: it’s encoded all the&
and =
(as %26
and %3d
respectively), which isn’t what we
wanted, so you need to convert them back. But eventually you end up with the URL:
http://localhost:8818/?download=free%80%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%e8&download=valuable&key=7b315dfdbebc98ebe696a5f62430070a1651631b
.
And that’s how you can manipulate a hash-protected string without access to its salt (in some circumstances).
The correct way to fix the problem is by using a HMAC in place
of a simple hash signature. Instead of calling sha1( SECRET_KEY . urldecode( $params ) )
, the code should call hash_hmac( 'sha1', urldecode( $params ), SECRET_KEY
)
. HMACs are theoretically-immune to length extension attacks, so long as the output of the hash function used is
functionally-random9.
Ideally, it should also use hash_equals( $validDownloadKey, $_GET['key'] )
rather than ===
, to mitigate the possibility of a timing attack. But that’s another story.
1 This attack isn’t SHA1-specific: it works just as well on many other popular hashing algorithms too.
2 SHA-1‘s blocks are 64 bytes long; other algorithms vary.
3 For SHA-1, the padding bits
consist of a 1
followed by 0
s, except the final 8-bytes are a big-endian number representing the length of the message.
4 SHA-1‘s IV is 67452301 EFCDAB89 98BADCFE 10325476 C3D2E1F0
, which you’ll observe is little-endian counting from 0
to
F
, then back from F
to 0
, then alternating between counting from 3
to 0
and C
to F
. It’s
considered good practice when developing a new cryptographic system to ensure that the hard-coded cryptographic primitives are simple, logical, independently-discoverable numbers like
simple sequences and well-known mathematical constants. This helps to prove that the inventor isn’t “hiding” something in there, e.g. a mathematical weakness that depends on a
specific primitive for which they alone (they hope!) have pre-calculated an exploit. If that sounds paranoid, it’s worth knowing that there’s plenty of evidence that various spy
agencies have deliberately done this, at various points: consider the widespread exposure of the BULLRUN programme and its likely influence on Dual EC DRBG.
5 The padding characters I’ve used aren’t accurate, just representative. But there’s the right number of them!
6 You shouldn’t do this: you’ll cause yourself many headaches in the long run. But you could.
7 It’s also not always obvious which inputs are included in hash generation and how they’re manipulated: if you’re actually using this technique adversarily, be prepared to do a little experimentation.
8 In this example, the hash operates over a single block, but the exact same principle applies regardless of the number of blocks.
9 Imagining the implementation of a nontrivial hashing algorithm, the predictability of whose output makes their HMAC vulnerable to a length extension attack, is left as an exercise for the reader.
Kev Quirk, Colin Walker, and other cool kids I follow online made it sound fun to share your “lifestack” as we approach the end of 2023.
So here’s mine: my digital “everyday carry” list of the tools and services I routinely use:
A particular joy of the Gemini and Spartan protocols – and the Markdown-like syntax of Gemtext – is their simplicity.
Even without a browser, you can usually use everyday command-line tools that you might have installed already to access relatively human-readable content.
Here are a few different command-line options that should show you a copy of this blog post (made available via CapsulePress, of course):
Gemini communicates over a TLS-encrypted channel (like HTTPS), so we need a to use a tool that speaks the language. Luckily: unless you’re on Windows you’ve probably got one installed already1.
This command takes the full gemini:// URL you’re looking for and the domain name it’s at. 1965 refers to the port number on which Gemini typically runs –
printf "gemini://danq.me/posts/gemini-without-a-browser\r\n" | \ openssl s_client -ign_eof -connect danq.me:1965
GnuTLS closes the connection when STDIN
closes, so we use cat
to keep it open. Note inclusion of --no-ca-verification
to allow self-signed
certificates (optionally add --tofu
for trust-on-first-use support, per the spec).
{ printf "gemini://danq.me/posts/gemini-without-a-browser\r\n"; cat -; } | \ gnutls-cli --no-ca-verification danq.me:1965
Netcat reimplementation Ncat makes Gemini requests easy:
printf "gemini://danq.me/posts/gemini-without-a-browser\r\n" | \ ncat --ssl danq.me 1965
Spartan is a little like “Gemini without TLS“, but it sports an even-more-lightweight request format which makes it especially easy to fudge requests2.
Note the use of cat
to keep the connection open long enough to get a response, as we did for Gemini over GnuTLS.
{ printf "danq.me /posts/gemini-without-a-browser 0\r\n"; cat -; } | \ telnet danq.me 300
cURL supports the telnet protocol too, which means that it can be easily coerced into talking Spartan:
printf "danq.me /posts/gemini-without-a-browser 0\r\n" | \ curl telnet://danq.me:300
Because TLS support isn’t needed, this also works perfectly well with Netcat – just substitute nc
/netcat
or whatever your platform calls it in place of
ncat
:
printf "danq.me /posts/gemini-without-a-browser 0\r\n" | \ ncat danq.me 300
I hope these examples are useful to somebody debugging their capsule, someday.
1 You can still install one on Windows, of course, it’s just less-likely that your operating system came with such a command-line tool built-in
2 Note that the domain and path are separated in a Spartan request and followed by the size of the request payload body: zero in all of my examples
I just finished reading Incredible Doom volumes 1 and 2, by Matthew Bogart and Jesse Holden, and man… that was a heartwarming and nostalgic tale!
Set in the early-to-mid-1990s world in which the BBS is still alive and kicking, and the Internet’s gaining traction but still lacks the “killer app” that will someday be the Web (which is still new and not widely-available), the story follows a handful of teenagers trying to find their place in the world. Meeting one another in the 90s explosion of cyberspace, they find online communities that provide connections that they’re unable to make out in meatspace.
It touches on experiences of 90s cyberspace that, for many of us, were very definitely real. And while my online “scene” at around the time that the story is set might have been different from that of the protagonists, there’s enough of an overlap that it felt startlingly real and believable. The online world in which I – like the characters in the story – hung out… but which occupied a strange limbo-space: both anonymous and separate from the real world but also interpersonal and authentic; a frontier in which we were still working out the rules but within which we still found common bonds and ideals.
Anyway, this is all a long-winded way of saying that Incredible Doom is a lot of fun and if it sounds like your cup of tea, you should read it.
Also: shortly after putting the second volume down, I ended up updating my Geek Code for the first time in… ooh, well over a decade. The standards have moved on a little (not entirely in a good way, I feel; also they’ve diverged somewhat), but here’s my attempt:
----- BEGIN GEEK CODE VERSION 6.0 ----- GCS^$/SS^/FS^>AT A++ B+:+:_:+:_ C-(--) D:+ CM+++ MW+++>++ ULD++ MC+ LRu+>++/js+/php+/sql+/bash/go/j/P/py-/!vb PGP++ G:Dan-Q E H+ PS++ PE++ TBG/FF+/RM+ RPG++ BK+>++ K!D/X+ R@ he/him! ----- END GEEK CODE VERSION 6.0 -----
1 I was amazed to discover that I could still remember most of my Geek Code syntax and only had to look up a few components to refresh my memory.
One of my favourite parts of my former role at the Bodleian Libraries was getting to work on exhibitions. Not just because it was varied and interesting work, but because it let me get up-close to remarkable artifacts that most people never even get the chance to see.
A personal favourite of mine are the Herculaneum Papyri. These charred scrolls were part of a private library near Pompeii that was buried by the eruption of Mount Vesuvius in 79 CE. Rediscovered from 1752, these ~1,800 scrolls were distributed to academic institutions around the world, with the majority residing in Naples’ Biblioteca Nazionale Vittorio Emanuele III.
As you might expect of ancient scrolls that got buried, baked, and then left to rot, they’re pretty fragile. That didn’t stop Victorian era researchers trying a variety of techniques to gently unroll them and read what was inside.
Like many others, what I love about the Herculaneum Papyri is the air of mystery. Each could be anything from a lost religious text to, I don’t know, somebody’s to-do list (“buy milk, arrange for annual service of chariot, don’t forget to renew volcano insurance…”).1
In recent years, we’ve tried “virtually unrolling” the scrolls using a variety of related technologies. And – slowly – we’re getting there.
So imagine my delight when this week, for the first time ever, a complete word was extracted from one of the carbonised, still-rolled-up scrolls from Herculaneum. Something that would have seemed inconceivable to the historians who first discovered and catalogued the scrolls is now possible, thanks to their careful conservation over the years along with the steady advance of technology.
1 For more-serious academic speculation about the potential value of the scrolls, Richard Carrier’s got you covered.
Foundry is a wonderful virtual tabletop tool well-suited to playing tabletop roleplaying games with your friends, no matter how far away they are. It compares very favourably to the market leader Roll20, once you get past some of the initial set-up challenges and a moderate learning curve.
You can run it on your own computer and let your friends “connect in” to it, so long as you’re able to reconfigure your router a little, but you’ll be limited by the speed of your home Internet connection and people won’t be able to drop in and e.g. tweak their character sheet except when you’ve specifically got the application running.
A generally better option is to host your Foundry server in the cloud. For most of its history, I’ve run mine on Fox, my NAS, but I’ve recently set one up on a more-conventional cloud virtual machine too. A couple of friends have asked me about how to set up their own, so here’s a quick guide:
Getting a virtual server is really easy nowadays.
You’ll need:
Choose a root password when you set up your server. If you’re a confident SSH user, add your public key so you can log in easily (and then disable password authentication entirely!).
For laziness, this guide has you run Foundry as root on your new server. Ensure you understand the implications of this.2
DNS propogation can be pretty fast, but… sometimes it isn’t. So get this step underway before you need it.
Your newly-created server will have an IP address, and you’ll be told what it is. Put that IP address into an A-record for your domain.
In my examples, my domain name is vtt.danq.me and my server is at 1.2.3.4. Yours will be different!
Connect to your new server using SSH. Your host might even provide a web interface if you don’t have an SSH client installed: e.g. Linode’s “Launch LISH Console” button will do pretty-much exactly that for you. Log in as root
using the password you chose
when you set up the server (or your SSH private key, if that’s your preference). Then, run each of the commands below in order (the full script is available as a single file if you
prefer).
You’ll need unzip
(to decompress Foundry), nodejs
(to run Foundry), ufw
(a firewall, to prevent unexpected surprises), nginx
(a
webserver, to act as a reverse proxy to Foundry), certbot
(to provide a free SSL certificate for Nginx),
nvm
(to install pm2
) and pm2
(to keep Foundry running in the background). You can install them all like this:
apt update apt upgrade apt install -y unzip nodejs ufw nginx certbot nvm npm install -g pm2
By default, Foundry runs on port 30000. If we don’t configure it carefully, it can be accessed directly, which isn’t what we intend: we want connections to go through the webserver (over https, with http redirecting to https). So we configure our firewall to allow only these ports to be accessed. You’ll also want ssh enabled so we can remotely connect into the server, unless you’re exclusively using an emergency console like LISH for this purpose:
ufw allow ssh ufw allow http ufw allow https ufw enable
Putting the domain name we’re using into a variable for the remainder of the instructions saves us from typing it out again and again. Make sure you type your domain name (that you pointed to your server in step 2), not mine (vtt.danq.me):
DOMAIN=vtt.danq.me
So long as the DNS change you made has propogated, this should Just Work. If it doesn’t, you might need to wait for a bit then try again.
certbot certonly --agree-tos --register-unsafely-without-email --rsa-key-size 4096 --webroot -w /var/www/html -d $DOMAIN
Don’t continue past this point until you’ve succeeded in getting the SSL certificate sorted.
The certificate will renew itself automatically, but you also need Nginx to restart itself whenever that happens. You can set that up like this:
printf "#!/bin/bash\nservice nginx restart\n" > /etc/letsencrypt/renewal-hooks/post/restart-nginx.sh chmod +x /etc/letsencrypt/renewal-hooks/post/restart-nginx.sh
You can, of course, manually write the Nginx configuration file: just remove the > /etc/nginx/sites-available/foundry
from the end of the printf
line to see
the configuration it would write and then use/adapt to your satisfaction.
set +H printf "server {\n listen 80;\n listen [::]:80;\n server_name $DOMAIN;\n\n # Redirect everything except /.well-known/* (used for ACME) to HTTPS\n root /var/www/html/;\n if (\$request_uri !~ \"^/.well-known/\") {\n return 301 https://\$host\$request_uri;\n }\n}\n\nserver {\n listen 443 ssl http2;\n listen [::]:443 ssl http2;\n server_name $DOMAIN;\n\n ssl_certificate /etc/letsencrypt/live/$DOMAIN/fullchain.pem;\n ssl_certificate_key /etc/letsencrypt/live/$DOMAIN/privkey.pem;\n\n client_max_body_size 300M;\n\n location / {\n # Set proxy headers\n proxy_set_header Host \$host;\n proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;\n proxy_set_header X-Forwarded-Proto \$scheme;\n\n # These are important to support WebSockets\n proxy_set_header Upgrade \$http_upgrade;\n proxy_set_header Connection \"Upgrade\";\n\n proxy_pass http://127.0.0.1:30000/;\n }\n}\n" > /etc/nginx/sites-available/foundry ln -sf /etc/nginx/sites-available/foundry /etc/nginx/sites-enabled/foundry service nginx restart
mkdir {vtt,data} cd vtt
Substitute in your Timed URL in place of <url from website>
(keep the quotation marks – "
–
though!):
wget -O foundryvtt.zip "<url from website>" unzip foundryvtt.zip rm foundryvtt.zip
Now you’re finally ready to launch Foundry! We’ll use PM2 to get it to run automatically in the background and keep running:
pm2 start --name "Foundry" node -- resources/app/main.js --dataPath=/root/data
You can watch the logs for Foundry with PM2, too. It’s a good idea to take a quick peep at them to check it launched okay (press CTRL-C to exit):
pm2 logs 0
https://vtt.danq.me
) and you should see Foundry’s first-load page, asking for your license key.
Provide your license key to get started, and then immediately change the default password: a new instance of Foundry has a blank default password, which means that anybody on Earth can administer your server: get that changed to something secure!
Now you’re running on Foundry!
1 Which currency you pay in, and therefore how much you pay, for a Foundry license depends on where in the world you are
where your VPN endpoint says you are. You might like to plan accordingly.
2 Running Foundry as root is dangerous, and you should consider the risks for yourself. Adding a new user is relatively simple, but for a throwaway server used for a single game session and then destroyed, I wouldn’t bother. Specifically, the risk is that a vulnerability in Foundry, if exploited, could allow an attacker to reconfigure any part of your new server, e.g. to host content of their choice or to relay spam emails. Running as a non-root user means that an attacker who finds such a vulnerability can only trash your Foundry instance.
For a while now, this site has been partially mirrored via the Gemini1 and Gopher protocols.2 Earlier this year I presented hacky versions of the tools I’d used to acieve this (and made people feel nostalgic).
Now I’ve added support for Spartan3 too and, seeing as the implementations shared functionality, I’ve combined all three – Gemini, Spartan, and Gopher – into a single package: CapsulePress.
CapsulePress is a Gemini/Spartan/Gopher to WordPress bridge. It lets you use WordPress as a CMS for any or all of those three non-Web protocols in addition to the Web.
For example, that means that this post is available on all of:
It’s also possible to write posts that selectively appear via different media: if I want to put something exclusively on my gemlog, I can, by assigning metadata that tells WordPress to suppress a post but still expose it to CapsulePress. Neat!
I’ve open-sourced the whole thing under a super-permissive license, so if you want your own WordPress blog to “feed” your Gemlog… now you can. With a few caveats:
Whether or not your WordPress blog makes the jump to Geminispace4, I hope you’ll came take a look at mine at one of the URLs linked above, and then continue to explore.
If you’re nostalgic for the interpersonal Internet – or just the idea of it, if you’re too young to remember it… you’ll find it there. (That Internet never actually went away, but it’s harder to find on today’s big Web than it is on lighter protocols.)
2 Also via Finger (but that’s another story).
I’ve resisted writing about the current trends in AI because, well, others are already doing it better.1 But I was inspired by Garrett‘s observation that – according to the Washington Post – the C4 dataset has tokenised his personal website.
-v
(verbose mode) activated.
Much has been said about how ChatGPT and her friends will hallucinate and mislead. Let’s take an example.
Remember that ChatGPT has almost-certainly read basically everything I’ve ever written online – it might well be better-informed about me better than you are – as you read this:
When I asked ChatGPT about me, it came up with a mixture of truths and believable lies2, along with a smattering of complete bollocks.
In another example, ChatGPT hallucinates this extra detail specifically because the conversation was foreshadowed by its previous mistake. At this point, it digs its heels in and commits to its claim, like the stubborn guy in the corner of the pub who doubles-down on his bullshit.
If you were to ask at the outset who wrote Notpron, ChatGPT would have gotten it right, but because it already mis-spoke, it’s now trapped itself in a lie, incapable of reconsidering what it said previously as having been anything but the truth:
Simon Willison says that we should call this behaviour “lying”. In response to this, several people told him that the “lying” excessively anthropomorphises these chatbots, implying that they’re deliberately attempting to mislead their users. Simon retorts:
I completely agree that anthropomorphism is bad: these models are fancy matrix arithmetic, not entities with intent and opinions.
But in this case, I think the visceral clarity of being able to say “ChatGPT will lie to you” is a worthwhile trade.
I agree with Simon. ChatGPT and systems like it are putting accessible AI into the hands of the masses, and that means that the people who are using it don’t necessarily understand – nor desire to learn – the statistical mechanisms that actually underpin the AI‘s “decisions” about how to respond.
Trying to explain how and why their new toy will get things horribly wrong is hard, and it takes a critical eye, time, and practice to begin to discover how to use these tools effectively and safely.3 It’s simpler just to say “Here’s a tool; by the way, it’s a really convincing liar and you can’t trust it even a little.”
Giving people tools that will lie to them. What an interesting time to be alive!
1 I’m tempted to blog about my experience of using Stable Diffusion and GPT-3 as assistants while DMing my regular Dungeons & Dragons game, but haven’t worked out exactly what I’m saying yet.
2 That ChatGPT lies won’t be a surprise to anybody who’s used the system nor anybody who
understands the fundamentals of how it works, but as AIs get integrated into more and more things, we’re going to need to teach a level of technical literacy about what that means,
just like we do should about, say, Wikipedia.
3 For many of the tasks people talk about outsourcing to LLMs, it’s the case that it would take less effort for a human to learn how to do the task that it would for them to learn how to supervise an AI performing the task! That’s not to say they’re useless: just that (for now at least) you should only trust them to do something that you could do yourself and you’re therefore able to critically assess how well the machine did it.
As I’ve mentioned before, I’m a fan of Tailsteak‘s Forward comic. I’m not a fan of the author’s weird aversion to RSS, so I hacked a way around it first using an exploit in webcomic reader app Comic Chameleon (accidentally getting access to comics weeks in advance of their publication as a side-effect) and later by using my own tool RSSey.
But now I’m able to use my favourite feed reader FreshRSS to scrape websites directly – like I’ve done for The Far Side – I should switch to using this approach to subscribe to Forward, too:
Here’s the settings I came up with –
http://forwardcomic.com/list.php
HTML + XPath (Web scraping)
//a[starts-with(@href,'archive.php')]
.
./@href
./following-sibling::text()[1]
- Y.m.d
<a>
s
separated by <br>
s rather than a <ul>
and <li>
s, for example, leaves something to be desired (and makes it harder to scrape,
too!).
I continue to love this “killer feature” of FreshRSS, but I’m beginning to see how it could go further – I wish I had the free time to contribute to its development!
I’d love to see a mechanism for exporting/importing feed configurations like this so that I could share them more-easily, for example. I’d also be delighted if I could expand on my XPath rules to load pages referenced by the results and get data from them, too, e.g. so I could use an image found by XPath on the “item link” page as the thumbnail image! These are things RSSey could do for me, but FreshRSS can’t… yet!
I must be the last person on Earth to have heard about radio.garden (thanks Pepsilora!), a website that uses a “globe” interface to let you tune in to radio stations around the globe. But I’d only used it for a couple of minutes before I discovered that there are region restrictions in place. Here in the UK, and perhaps elsewhere, you can’t listen to stations in other countries without using a VPN or similar tool… which might introduce a different region’s restrictions!
So I threw together a quick workaround:
For those looking to get into userscripting, here’s a quick tutorial on what I did to develop this bypass.
First, I played around with radio.garden for a bit to get a feel for what it was doing. I guessed that it must be tuning into a streaming URL when you select a radio station, so I opened by browser’s debugger on the Network tab and looked at what happened when I clicked on a “working” radio station, and how that differed when I clicked on a “blocked” one:
When connecting to a station, a request is made for some JSON that contains station metadata. Then, for a working
station, a request is made for an address like /api/ara/content/listen/[ID]/channel.mp3
. For a blocked station, this request isn’t made.
I figured that the first thing I’d try would be to get the [ID]
of a station that I’m not permitted to listen to and manually try the URL to see if it was actually blocked, or merely not-being-loaded. Looking at a working station, I first found the ID in the
JSON response and I was about to extract it when I noticed that it also appeared in the request for the
JSON: that’s pretty convenient!
My hypothesis was
that the “blocking” is entirely implemented in the front-end: that the JavaScript code that makes the pretty bits work is looking at the “country” data that’s returned and using that to
decide whether or not to load the audio stream. That provides many different ways to bypass it, from manipulating the JavaScript to remove that functionality, to altering the
JSON response so that every station appears to be in the user’s country, to writing some extra code that intercepts the
request for the metadata and injects an extra audio player that doesn’t comply with the regional restrictions.
But first I needed to be sure that there wasn’t some actual e.g. IP-based blocking on the streams. To do this, first I took the
/api/ara/content/listen/[ID]/channel.mp3
address of a known-working station and opened it in VLC using Media
> Open Network Stream…. That worked. Then I did the same thing again, but substituted the [ID]
part of the address with the ID of a “blocked” station.
VLC happily started spouting French to me: the bypass would, in theory, work!
Next, I needed to get that to work from within the site itself. It’s implemented in React, which is a pig to inject code into because it uses horrible identifiers for
DOM elements. But of course I knew that there’d be this tell-tale fetch
request for the station metadata that I
could tap into, so I used this technique to override the native fetch
method and
replace it with my own “wrapper” that logged the stream address for any radio station I clicked on. I tested the addresses this produced using my browser.
window.fetch = new Proxy(window.fetch, { apply: (target, that, args)=>{ const tmp = target.apply(that, args); tmp.then(res=>{ const matches = res.url.match(/\/api\/ara\/content\/channel\/(.*)/); if(matches){ const stationId = matches[1]; console.log(`http://radio.garden/api/ara/content/listen/${stationId}/channel.mp3`); } }); return tmp; }, });
That all worked nicely, so all I needed to do now was to use those addresses rather than simply logging them. Rather that get into the weeds reverse-engineering the built-in
player, I simply injected a new <audio>
element after it and pointed it at the correct address, and applied a couple of CSS tweaks to make it fit in nicely.
The only problem was that on UK-based radio stations I’d now hear a slight echo, because the original player was still working. I
could’ve come up with an elegant solution to this, I’m sure, but I went for a quick-and-dirty hack: I used res.json()
to obtain the body of the metadata response… which
meant that the actual code that requested it would no longer be able to get it (you can only decode the body of a fetch response once!). radio.garden’s own player treats this as an
error and doesn’t play that radio station, but my new <audio>
element still plays it perfectly well.
It’s not pretty, but it’s functional. You can read the finished source code on Github. I don’t anticipate that I’ll be maintaining this script so if it stops working you’ll have to fix it yourself, and I have no intention of “finishing” it by making it nicer or prettier. I just wanted to share in case you can learn anything from my approach.
This is a repost promoting content originally published elsewhere. See more things Dan's reposted.
This is an IBM tape library robot. It’s designed to fetch, load, unload, and return tape media cartridges to the correct bay in large enterprise environments.
One fateful ‘workend’, I made one serve drinks.
It went back into prod on the Monday…
…
In a story reminiscient of those anecdotes about early computer science students competing to “race” hard drives across the lab by writing programs that moved the heads in a way that vibrated/walked the devices, @SecurityWriter shares a wonderful story about repurposing a backup tape management robot to act as a server (pun intended) of drinks.
The week before last I had the opportunity to deliver a “flash talk” of up to 4 minutes duration at a work meetup in Vienna, Austria. I opted to present a summary of what I’ve learned while adding support for Finger and Gopher protocols to the WordPress installation that powers DanQ.me (I also hinted at the fact that I already added Gemini and Spring ’83 support, and I’m looking at other protocols). If you’d like to see how it went, you can watch my flash talk here or on YouTube.
If you love the idea of working from wherever-you-are but ocassionally meeting your colleagues in person for fabulous in-person events with (now optional) flash talks like this, you might like to look at Automattic’s recruitment pages…
The presentation is a shortened, Automattic-centric version of a talk I’ll be delivering tomorrow at Oxford Geek Nights #53; so if you’d like to see it in-person and talk protocols with me over a beer, you should come along! There’ll probably be blog posts to follow with a more-detailed look at the how-and-why of using WordPress as a CMS not only for the Web but for a variety of zany, clever, retro, and retro-inspired protocols down the line, so perhaps consider the video above a “teaser”, I guess?
It all started when I saw no-ht.ml, Terence Eden‘s hilarious response to Salma Alam-Naylor‘s excellent HTML is all you need to make a website. The latter is an argument against both the silly amount of JavaScript with which websites routinely burden their users, but also even against depending on CSS. As a fan of CSS Naked Day and a firm believer in using JS only for progressive enhancement, I’m obviously in favour.
Terence’s site works by delivering a document with a
claimed MIME type of text/html
, but which contains only the (invalid) “HTML” code
<!doctype UNICODE><meta charset="UTF-8"><plaintext>
(to work around browsers’ wish to treat the page as HTML). This is followed by a block of UTF-8 plain text making use of spacing
and emoji to illustrate and decorate the content. It’s frankly very silly, and I love it.1
I think it’s possible to go one step further, though, and create a web page with no code whatsoever. That is, one that you can read as if it were a regular web page, but where
using View Source or e.g. downloading the page with curl
will show you… nothing.
I present: The Page With No Code! (It’ll probably only work if you’re using Firefox, for reasons that will become apparent later.)
Once you’ve had a look for yourself and had a chance to form an opinion, here’s an explanation of the black magic that makes this atrocity possible:
Content-Type: text/html
. Your browser interprets a completely-blank page as faulty and corrects it to a functionally-blank
minimal HTML page: <html><head></head><body></body></html>
.
<body>
and <html>
elements can be styled with CSS; this includes the ability to add
content: ::before
and ::after
each
element. If only we could load a stylesheet then content injection is possible.
Link:
HTTP header – to deliver a CSS payload (this, unfortunately, only works in Firefox). To further obfuscate what’s happening and remove the need for a round-trip, this is encoded
as a data: URI.
This is one of the most disgusting things I’ve ever coded, and that’s saying a lot. I’m so proud of myself. You can view the code I used to generate this awful thing on Github.
My server-side implementation of this broke in 2023 after I upgraded Nginx; my new version doesn’t support the super-long Link: header needed
to make this hack work, so I’ve updated the page to use the Link: to reference the CSS file rather than embed it via a data URI. It’s not as cool, but it at least means you can
still see the page. Thanks to Thomas Bradshaw for pointing out the problem.
1 My first reaction was “why not just deliver something with Content-Type:
text/plain; charset=utf-8
and dispense with the invalid code, but perhaps that’s just me overthinking the non-existent problem.
You don’t really see it any more, but: if you downloaded some media player software a couple of decades ago, it’d probably appear in a weird-shaped window, and I’ve never understood why.
Mostly, these designs are… pretty ugly. And for what? It’s also worth noting that this kind of design can be found in all kinds of applications, in media players that
it was almost ubiquitous.
You might think that they’re an overenthusiastic kind of skeuomorphic design: people trying to make these players look like their physical analogues. But hardware players were still pretty boxy-looking at this point, either because of the limitations of their data storage1. By the time flash memory-based portable MP3 players became commonplace their design was copying software players, not the other way around.
So my best guess is that these players were trying to stand out as highly-visible. Like: they were things you’d want to occupy a disproportionate amount of desktop space. Maybe other people were listening to music differently than me… but for me, back when screen real estate was at such a premium2, a music player’s job was to be small, unintrusive, and out-of-the-way.
It’s a mystery to me why anybody would (or still does) make media player software or skins for them that eat so much screen space, frequently looking ugly while they do so, only to look like a hypothetical hardware device that wouldn’t actually become commonplace until years after this kind of player design premiered!
Maybe other people listened to music on their computer differently from me: putting it front and centre, not using their computer for other tasks at the same time. And maybe for these people the choice of player and skin was an important personalisation feature; a fashion statement or a way to show off their personal identity. But me? I didn’t get it then, and I don’t get it now. I’m glad that this particular trend seems to have died and windows are, for the most part, rounded rectangles once more… even for music player software!
1 A walkman, minidisc player, or hard drive-based digital music device is always going to look somewhat square because of what’s inside.
2 I “only” had 1600 × 1200 (UXGA) pixels on the very biggest monitor I owned before I went widescreen, and I spent a lot of time on monitors at lower resolutions e.g. 1024 × 768 (XGA); on such screens, wasting space on a music player when you’re mostly going to be listening “in the background” while you do something else seemed frivolous.