BBC News… without the crap

Did I mention recently that I love RSS? That it brings me great joy? That I start and finish almost every day in my feed reader? Probably.

I used to have a single minor niggle with the BBC News RSS feed: that it included sports news, which I didn’t care about. So I wrote a script that downloaded it, stripped sports news, and re-exported the feed for me to subscribe to. Magic.

RSS reader showing duplicate copies of the news story "Barbie 2? 'We'd love to,' says Warner Bros boss", and an entry from BBC Sounds.
Lately my BBC News feed has caused me some annoyance and frustration.

But lately – presumably as a result of technical changes at the Beeb’s side – this feed has found two fresh ways to annoy me:

  1. The feed now re-publishes a story if it gets re-promoted to the front pagebut with a different <guid> (it appears to get a #0 after it when first published, a #1 the second time, and so on). In a typical day the feed reader might scoop up new stories about once an hour, any by the time I get to reading them the same exact story might appear in my reader multiple times. Ugh.
  2. They’ve started adding iPlayer and BBC Sounds content to the BBC News feed. I don’t follow BBC News in my feed reader because I want to watch or listen to things. If you do, that’s fine, but I don’t, and I’d rather filter this content out.

Luckily, I already have a recipe for improving this feed, thanks to my prior work. Let’s look at my newly-revised script (also available on GitHub):

#!/usr/bin/env ruby
require 'bundler/inline'

# # Sample crontab:
# # At 41 minutes past each hour, run the script and log the results
# */20 * * * * ~/bbc-news-rss-filter-sport-out.rb > ~/bbc-news-rss-filter-sport-out.log 2>>&1

# Dependencies:
# * open-uri - load remote URL content easily
# * nokogiri - parse/filter XML
gemfile do
  source 'https://rubygems.org'
  gem 'nokogiri'
end
require 'open-uri'

# Regular expression describing the GUIDs to reject from the resulting RSS feed
# We want to drop everything from the "sport" section of the website, also any iPlayer/Sounds links
REJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/(sport|iplayer|sounds)\//

# Load and filter the original RSS
rss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
rss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING }.each(&:unlink)

# Strip the anchors off the <guid>s: BBC News "republishes" stories by using guids with #0, #1, #2 etc, which results in duplicates in feed readers
rss.css('guid').each{|g|g.content=g.content.gsub(/#.*$/,'')}

File.open( '/www/bbc-news-no-sport.xml', 'w' ){ |f| f.puts(rss.to_s) }
It’s amazing what you can do with Nokogiri and a half dozen lines of Ruby.

That revised script removes from the feed anything whose <guid> suggests it’s sports news or from BBC Sounds or iPlayer, and also strips any “anchor” part of the <guid> before re-exporting the feed. Much better. (Strictly speaking, this can result in a technically-invalid feed by introducing duplicates, but your feed reader oughta be smart enough to compensate for and ignore that: mine certainly is!)

You’re free to take and adapt the script to your own needs, or – if you don’t mind being tied to my opinions about what should be in BBC News’ RSS feed – just subscribe to my copy at: https://fox.q-t-a.uk/bbc-news-no-sport.xml

RSS reader showing duplicate copies of the news story "Barbie 2? 'We'd love to,' says Warner Bros boss", and an entry from BBC Sounds.×

RSS > ActivityPub

RSS is better than ActivityPub1.

Photograph of a boxing match, but with the heads of the competitors replaced with the ActivityPub and RSS logos (and "AP" or "RSS" written on their clothes, respectively). RSS is delivering a powerful uppercut to ActivityPub.
A devastating blow by RSS against a competitor 19 years his junior! For updates on this bout as it develops, don’t forget to subscribe… using either protocol.

When I subscribe to content, I want:

  • Resilient failsafes. ActivityPub has many points-of-failure. A notification might fail to complete transmission as a result of downtime, faults, or network conditions, and the receiving server might never know. A feed reader, conversely, can tell you that an address 404’d or the server was down.
  • Retroactive access. Once you fix the problem above… you still don’t get the message you missed: it’s probably gone forever – there’s no retroactive access. The same is true when your ActivityPub server connects with a peer for the first time: you only ever get new content after that point. RSS, on the other hand, provides some number of “recent” items the moment you first subscribe.
  • Simple subscriptions. RSS can be served from a statically-hosted single file, which makes it suitable to deploy anywhere as well as consume using anything. It can be read, after a fashion, in anything from Lynx upwards.

RSS ticks all these boxes. If I can choose between RSS and ActivityPub to subscribe to your content, and I don’t need a real-time update, I’m probably going to choose RSS.

Footnotes

1 I feel like this statement needs a few clarifications and caveats, but my hot take looks spicier if I bury them in a footnote!

  • By RSS, I mean whichever pull-based basic HTTP you like, be that Atom, JSON Feed, h-entry, or even just properly-marked-up HTML5: did you know that the <article> element is intended to be suitable for syndication use?
  • Obviously I appreciate that RSS and ActivityPub are different tools for different jobs, and there are doubtless use-cases for which ActivityPub is clearly the superior solution.
  • I certainly don’t object to services providing both RSS and ActivityPub as syndication options, like Mastodon does, where both might be good choices.
Photograph of a boxing match, but with the heads of the competitors replaced with the ActivityPub and RSS logos (and "AP" or "RSS" written on their clothes, respectively). RSS is delivering a powerful uppercut to ActivityPub.×

Netscape’s Untold Webstories

I mentioned yesterday that during Bloganuary I’d put non-Bloganuary-prompt post ideas onto the backburner, and considered extending my daily streak by posting them in February. Here’s part of my attempt to do that:

Let’s take a trip into the Web of yesteryear, with thanks to our friends at the Internet Archive’s WayBack Machine.

The page we’re interested in used to live at http://www.netscape.com/comprod/columns/webstories/index.html, and promised to be a showcase for best practice in Web development. Back in October 1996, it looked like this:

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to debut in November."

The page is a placeholder for Netscape Webstories (or Web Site Stories, in some places). It’s part of a digital magazine called Netscape Columns which published pieces written by Marc Andreeson, Jim Barksdale, and other bigwigs in the hugely-influential pre-AOL-acquisition Netscape Communications.

This new series would showcase best practice in designing and building Web sites1, giving a voice to the technical folks best-placed to speak on that topic. That sounds cool!

Those white boxes above and below the paragraph of text aren’t missing images, by the way: they’re horizontal rules, using the little-known size attribute to specify a thickness of <hr size=4>!2

Certainly you’re excited by this new column and you’ll come back in November 1996, right?

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in January."

Oh. The launch has been delayed, I guess. Now it’s coming in January.

The <hr>s look better now their size has been reduced, though, so clearly somebody’s paying attention to the page. But let’s take a moment and look at that page title. If you grew up writing web pages in the modern web, you might anticipate that it’s coded something like this:

<h2 style="font-variant: small-caps; text-align: center;">Coming Soon</h2>

There’s plenty of other ways to get that same effect. Perhaps you prefer font-feature-settings: 'smcp' in your chosen font; that’s perfectly valid. Maybe you’d use margin: 0 auto or something to centre it: I won’t judge.

But no, that’s not how this works. The actual code for that page title is:

<center>
  <h2>
    <font size="+3">C</font>OMING
    <font size="+3">S</font>OON
  </h2>
</center>

Back when this page was authored, we didn’t have CSS3. The only styling elements were woven right in amongst the semantic elements of a page4. It was simple to understand and easy to learn… but it was a total mess5.

Anyway, let’s come back in January 1997 and see what this feature looks like when it’s up-and-running.

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in the spring."

Nope, now it’s pushed back to “the spring”.

Under Construction pages were all the rage back in the nineties. Everybody had one (or several), usually adorned with one or more of about a thousand different animated GIFs for that purpose.6

Rotating animated "under construction" banner.

Building “in public” was an act of commitment, a statement of intent, and an act of acceptance of the incompleteness of a digital garden. They’re sort-of coming back into fashion in the interpersonal Web, with the “garden and stream” metaphor7 taking root. This isn’t anything new, of course – Mark Bernstein touched on the concepts in 1998 – but it’s not something that I can ever see returning to the “serious” modern corporate Web: but if you’ve seen a genuine, non-ironic “under construction” page published to a non-root page of a company’s website within the last decade, please let me know!

Under construction banner with an animated yellow-and-black tape banner between two "men at work" signs.

RSS doesn’t exist yet (although here’s a fun fact: the very first version of RSS came out of Netscape!). We’re just going to have to bookmark the page and check back later in the year, I guess…

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page identical to the previous version but with a search box ("To search the Netscape Columns, type a word or phrase here:") beneath.

Okay, so February clearly isn’t Spring, but they’ve updated the page… to add a search form.

It’s a genuine <form> tag, too, not one of those old-fashioned <isindex> tags you’d still sometimes find even as late as 1997. Interestingly, it specifies enctype="application/x-www-form-urlencoded". Today’s developers probably don’t think about the enctype attribute except when they’re doing a form that handles file uploads and they know they need to switch it to enctype="multipart/form-data", (or their framework does this automatically for them!).

But these aren’t the only options, and some older browsers at this time still defaulted to enctype="text/plain".  So long as you’re using a POST and not GET method, the distinction is mostly academic, but if your backend CGI program anticipates that special characters will come-in encoded, back then you’d be wise to specify that you wanted URL-encoding or you might get a nasty surprise when somebody turns up using LMB or something equally-exotic.

Anyway, let’s come back in June. The content must surely be up by now:

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in August."

Oh come on! Now we’re waiting until August?

At least the page isn’t abandoned. Somebody’s coming back and editing it from time to time to let us know about the still-ongoing series of delays. And that’s not a trivial task: this isn’t a CMS. They’re probably editing the .html file itself in their favourite text editor, then putting the appropriate file:// address into their copy of Netscape Navigator (and maybe other browsers) to test it, then uploading the file – probably using FTP – to the webserver… all the while thanking their lucky stars that they’ve only got the one page they need to change.

We didn’t have scripting languages like PHP yet, you see8. We didn’t really have static site generators. Most servers didn’t implement server-side includes. So if you had to make a change to every page on a site, for example editing the main navigation menu, you’d probably have to open and edit dozens or even hundreds of pages. Little wonder that framesets caught on, despite their (many) faults, with their ability to render your navigation separately from your page content.

Okay, let’s come back in August I guess:

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in the spring." Again.

Now we’re told that we’re to come back… in the Spring again? That could mean Spring 1998, I suppose… or it could just be that somebody accidentally re-uploaded an old copy of the page.

Hey: the footer’s gone too? This is clearly a partial re-upload: somebody realised they were accidentally overwriting the page with the previous-but-one version, hit “cancel” in their FTP client (or yanked the cable out of the wall), and assumed that they’d successfully stopped the upload before any damage was done.

They had not.

Screenshot of a Windows 95 dialog box, asking "Are you sure you want to delete index.html?" The cursor hovers over the "Yes" button.

I didn’t mention that top menu, did I? It looks like it’s a series of links, styled to look like flat buttons, right? But you know that’s not possible because you can’t rely on having the right fonts available: plus you’d have to do some <table> trickery to lay it out, at which point you’d struggle to ensure that the menu was the same width as the banner above it. So how did they do it?

The menu is what’s known as a client-side imagemap. Here’s what the code looks like:

<a href="/comprod/columns/images/nav.map">
  <img src="/comprod/columns/images/websitestories_ban.gif" width=468 height=32 border=0 usemap="#maintopmap" ismap>
</a><map name="mainmap">
  <area coords="0,1,92,24" href="/comprod/columns/mainthing/index.html">
  <area coords="94,1,187,24" href="/comprod/columns/techvision/index.html">
  <area coords="189,1,278,24" href="/comprod/columns/webstories/index.html">
  <area coords="280,1,373,24" href="/comprod/columns/intranet/index.html">
  <area coords="375,1,467,24" href="/comprod/columns/newsgroup/index.html">
</map>

The image (which specifies border=0 because back then the default behaviour for graphical browser was to put a thick border around images within hyperlinks) says usemap="#maintopmap" to cross-reference the <map> below it, which defines rectangular areas on the image and where they link to, if you click them! This ingenious and popular approach meant that you could transmit a single image – saving on HTTP round-trips, which were relatively time-consuming before widespread adoption of HTTP/1.1‘s persistent connections – along with a little metadata to indicate which pixels linked to which pages.

The ismap attribute is provided as a fallback for browsers that didn’t yet support client-side image maps but did support server-side image maps: there were a few! When you put ismap on an image within a hyperlink, then when the image is clicked on the href has appended to it a query parameter of the form ?123,456, where those digits refer to the horizontal and vertical coordinates, from the top-left, of the pixel that was clicked on! These could then be decoded by the webserver via a .map file or handled by a CGI program. Server-side image maps were sometimes used where client-side maps were undesirable, e.g. when you want to record the actual coordinates picked in a spot-the-ball competition or where you don’t want to reveal in advance which hotspot leads to what destination, but mostly they were just used as a fallback.9

Both client-side and server-side image maps still function in every modern web browser, but I’ve not seen them used in the wild for a long time, not least because they’re hard (maybe impossible?) to make accessible and they can’t cope with images being resized, but also because nowadays if you really wanted to make an navigation “image” you’d probably cut it into a series of smaller images and make each its own link.

Anyway, let’s come back in October 1997 and see if they’ve fixed their now-incomplete page:

Screenshot from Netscape Columns: Web Site Stories: the Coming Soon page is now laid out in two columns, but the expected launch date has been removed.

Oh, they have! From the look of things, they’ve re-written the page from scratch, replacing the version that got scrambled by that other employee. They’ve swapped out the banner and menu for a new design, replaced the footer, and now the content’s laid out in a pair of columns.

There’s still no reliable CSS, so you’re not looking at columns: (no implementations until 2014) nor at display: flex (2010) here. What you’re looking at is… a fixed-width <table> with a single row and three columns! Yes: three – the middle column is only 10 pixels wide and provides the “gap” between the two columns of text.10

This wasn’t Netscape’s only option, though. Did you ever hear of the <multicol> tag? It was the closest thing the early Web had to a semantically-sound, progressively-enhanced multi-column layout! The author of this page could have written this:

<multicol cols=2 gutter=10 width=301>
  <p>
    Want to create the best possible web site? Join us as we explore the newest
    technologies, discover the coolest tricks, and learn the best secrets for
    designing, building, and maintaining successful web sites.
  </p>
  <p>
    Members of the Netscape web site team, recognized designers, and technical
    experts will share their insights and experiences in Web Site Stories. 
  </p>
</multicol>

That would have given them the exact same effect, but with less code and it would have degraded gracefully. Browsers ignore tags they don’t understand, so a browser without support for <multicol> would have simply rendered the two paragraphs one after the other. Genius!

So why didn’t they? Probably because <multicol> only ever worked in Netscape Navigator.

Introduced in 1996 for version 3.0, this feature was absolutely characteristic of the First Browser War. The two “superpowers”, Netscape and Microsoft, both engaged in unilateral changes to the HTML specification, adding new features and launching them without announcement in order to try to get the upper hand over the other. Both sides would often refuse to implement one-another’s new tags unless they were forced to by widespread adoption by page authors, instead promoting their own competing mechanisms11.

Between adding this new language feature to their browser and writing this page, Netscape’s market share had fallen from around 80% to around 55%, and most of their losses were picked up by IE. Using <multicol> would have made their page look worse in Microsoft’s hot up-and-coming browser, which wouldn’t have helped them persuade more people to download a copy of Navigator and certainly wouldn’t be a good image on a soon-to-launch (any day now!) page about best-practice on the Web! So Netscape’s authors opted for the dominant, cross-platform solution on this page12.

Anyway, let’s fast-forward a bit and see this project finally leave its “under construction” phase and launch!

Screenshot showing the homepage of Netscape Columns from 15 February 1998; the first recorded copy NOT to have a header link to the Webstories / Web Site Stories page.

Oh. It’s gone.

Sometime between October 1997 and February 1998 the long promised “Web Site Stories” section of Netscape Columns quietly disappeared from the website. Presumably, it never published a single article, instead remaining a perpetual “Coming Soon” page right up until the day it was deleted.

I’m not sure if there’s a better metaphor for Netscape’s general demeanour in 1998 – the year in which they finally ceased to be the dominant market leader in web browsers – than the quiet deletion of a page about how Netscape customers are making the best of the Web. This page might not have been important, or significant, or even completed, but its disappearance may represent Netscape’s refocus on trying to stay relevant in the face of existential threat.

Of course, Microsoft won the First Browser War. They did so by pouring a fortune’s worth of developer effort into staying technologically one-step ahead, refusing to adopt standards proposed by their rival, and their unprecedented decision to give away their browser for free13.

Footnotes

1 Yes, we used to write “Web sites” as two words. We also used to consistently capitalise the words Web and Internet. Some of us still do so.

2 In case it’s not clear, this blog post is going to be as much about little-known and archaic Web design techniques as it is about Netscape’s website.

3 This is a white lie. CSS was first proposed almost at the same time as the Web! Microsoft Internet Explorer was first to deliver a partial implementation of the initial standard, late in 1996, but Netscape dragged their heels, perhaps in part because they’d originally backed a competing standard called JavaScript Style Sheets (JSSS). JSSS had a lot going for it: if it had enjoyed widespread adoption, for example, we’d have had the equivalent of CSS variables a full twenty years earlier! In any case, back in 1996 you definitely wouldn’t want to rely on CSS support.

4 Wondering where the text and link colours come from? <body bgcolor="#ffffff" text="#000000" link="#0000ff" vlink="#ff0000" alink="#ff0000">. Yes really, that’s where we used to put our colours.

5 Personally, I really loved the aesthetic Netscape touted when using Times New Roman (or whatever serif font was available on your computer: webfonts weren’t a thing yet) with temporary tweaks to font sizes, and I copied it in some of my own sites. If you look back at my 2018 blog post celebrating two decades of blogging, where I’ve got a screenshot of my blog as it looked circa 1999, you’ll see that I used exactly this technique for the ordinal suffixes on my post dates! On the same post, you’ll see that I somewhat replicated the “feel” of it again in my 2011 design, this time using a stylesheet.

6 There’s a whole section of Cameron’s World dedicated to “under construction” banners, and that’s a beautiful thing!

7 The idea of “garden and stream” is that you publish early and often, refining as you go, in your garden, which can act as an extension of whatever notetaking system you use already, but publish mostly “finished” content to your (chronological) stream. I see an increasing number of IndieWeb bloggers going down this route, but I’m not convinced that it’s for me.

8 Another white lie. PHP was released way back in 1995 and even the very first version supported something a lot like server-side includes, using the syntax <!--include /file/name.html-->. But it was a little computationally-intensive to run willy-nilly.

9 Server-side imagemaps are enjoying a bit of a renaissance on .onion services, whose visitors often keep JavaScript disabled, to make image-based CAPTCHAs. Simply show the visitor an image and describe the bit you want them to click on, e.g. “the blue pentagon with one side missing”, then compare the coordinates of the pixel they click on to the knowledge of the right answer. Highly-inaccessible, of course, but innovative from a purely-technical perspective.

10 Nowadays, use of tables for layout – or, indeed, for anything other than tabular data – is very-much frowned upon: it’s often bad for accessibility and responsive design. But back before we had the features granted to us by the modern Web, it was literally the only way to get content to appear side-by-side on a page, and designers got incredibly creative about how they misused tables to lay out content, especially as browsers became more-sophisticated and began to support cells that spanned multiple rows or columns, tables “nested” within one another, and background images.

11 It was a horrible time to be a web developer: having to make hacky workarounds in order to make use of the latest features but still support the widest array of browsers. But I’d still take that over the horrors of rendering engine monoculture!

12 Or maybe they didn’t even think about it and just copy-pasted from somewhere else on their site. I’m speculating.

13 This turned out to be the master-stroke: not only did it partially-extricate Microsoft from their agreement with Spyglass Inc., who licensed their browser engine to Microsoft in exchange for a percentage of sales value, but once Microsoft started bundling Internet Explorer with Windows it meant that virtually every computer came with their browser factory-installed! This strategy kept Microsoft on top until Firefox and Google Chrome kicked-off the Second Browser War in the early 2010s. But that’s another story.

Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to debut in November."× Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in January."× Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in the spring."× Rotating animated "under construction" banner.× Under construction banner with an animated yellow-and-black tape banner between two "men at work" signs.× Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page identical to the previous version but with a search box ("To search the Netscape Columns, type a word or phrase here:") beneath.× Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in August."× Screenshot from Netscape Columns: Web Site Stories: a Coming Soon page which says "The series is scheduled to begin in the spring." Again.× Screenshot of a Windows 95 dialog box, asking "Are you sure you want to delete index.html?" The cursor hovers over the "Yes" button.× Screenshot from Netscape Columns: Web Site Stories: the Coming Soon page is now laid out in two columns, but the expected launch date has been removed.× Screenshot showing the homepage of Netscape Columns from 15 February 1998; the first recorded copy NOT to have a header link to the Webstories / Web Site Stories page.×

They See Me (Blog)Rolling

Tracy Durnell’s post about blogrolls really spoke to me. Like her, I used to think of a blogroll as a list of people you know personally (who happen to blog)1, but the number of bloggers among my immediate in-person circle of friends has shrunk from several dozen to just a handful, and I dropped my blogroll in around 2008.

A white man wearing a spacesuit sits on a pebble beach using a laptop.
On the Internet, a blogger is only as alone as they choose to be.

But my connection to a wider circle has grown, and like Tracy I enjoy the “hardly strangers” connection I feel with the people I follow online. She writes:

While social media emphasizes the show-off stuff — the vacation in Puerto Vallarta, the full kitchen remodel, the night out on the town — on blogs it still seems that people are sharing more than signalling. These small pleasures seem to be offered in a spirit of generosity — this is too beautiful not to share.

Although I may never interact with all the folks whose blogs I follow, reading the same blogger for a long time does build a (one-sided) connection. I may not know you, author, but I am rooting for you. It’s a different modality of relationship than we may be used to in person, but it’s real: a parasocial relationship simmering with the potential for deeper connection, but also satisfying as it exists.

My first bloggy pan pal, Colin Walker, who I started exchanging emails with earlier this month, followed-up on this with an observation that really gets to the heart of the issue (speaking as somebody who’s long said that my blog’s intended audience is, first and foremost, me):

At its core, blogging is a solitary activity with many (if not most) authors claiming that their blog is for them – myself included. Yet, the implication of audience cannot be ignored. Indeed, the more an author embeds themself in the loose community of blogs, by reading and linking to others, the more that implication becomes reality even if not actively pursued via comments or email.

To that end: I’ve started publishing my blogroll again! Follow that link and you’ll see an only-lightly-curated list of all the people (plus some non-personal blogs, vlogs, and webcomics) I follow (that have updated their feeds within the last year2). Naturally, there’s an OPML version too, and I’ve open-sourced the code I used to generate it (although I can’t imagine anybody’s situation is enough like mine for it to be useful).

The page is a little flaky and there’s things I’d like to do to improve it, but I’d rather publish a basic version now and then come back to it with my gardening gloves on another time to improve it.

Maybe my blogroll has some folks on that you might recognise? Or else: maybe you’re only a single random-click away from somebody new you never heard of before!

Footnotes

1 Possibly marked up with XFN to indicate how you’re connected to one another, but I’ve always had a soft spot for XFN.

2 I often retain subscriptions to dormant feeds and it sometimes pays-off, e.g. when I recently celebrated Octopuns’ return after a 9½-year hiatus!

A white man wearing a spacesuit sits on a pebble beach using a laptop.×

I don’t want your data

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

The web loves data. Data about you. Data about who you are, about what you do, what you love doing, what you love eating.

I, on the other end, couldn’t care less about your data. I don’t run analytics on this website. I don’t care which articles you read, I don’t care if you read them. I don’t care about which post is the most read or the most clicked. I don’t A/B test, I don’t try to overthink my content. I just don’t care.

Manu speaks my mind. Among the many hacks I’ve made to this site, I actively try not to invade on your privacy by collecting analytics, and I try not to let others to so either!

My blog is for myself first and foremost (if you enjoy it too, that’s just a bonus). This leads to two conclusions:

  1. If I’m the primary audience, I don’t need analytics (because I know who I am), and
  2. I don’t want to be targeted by invasive analytics (and use browser extensions to block them, e.g. I by-default block all third-party scripts, delete cookies from non-allowlisted domains 15 seconds after navigating away from sites, etc.); so I’d prefer them not to be on a site for which I’m the primary audience!

I’ve gone into more detail about this on my privacy page and hinted at it on my colophon. But I don’t know if anybody ever reads either of those pages, of course!

Making WordPress Fast (The Hard Way)

This isn’t the guide for you

The Internet is full of guides on easily making your WordPress installation run fast. If you’re looking to speed up your WordPress site, you should go read those, not this.

Those guides often boil down to the same old tips:

  1. uninstall unnecessary plugins,
  2. optimise caching (both on the server and, via your headers, on clients/proxies),
  3. resize your images properly and/or ensure WordPress is doing this for you,
  4. use a CDN (and use DNS prefetch hints)1,
  5. tune your PHP installation so it’s got enough memory, keeps a process alive, etc.,
  6. ensure your server is minifying2 and compressing files, and
  7. run it on a faster server/behind a faster connection3
You’ve heard those tips before, right? Today, let’s try something different.

The hard way

This article is for people who aren’t afraid to go tinkering in their WordPress codebase to squeeze a little extra (real world!) performance.

It’s for people whose neverending quest for perfection is already well beyond the point of diminishing returns.

But mostly, it’s for people who want to gawp at me, the freak who actually did this stuff just to make his personal blog a tiny bit nippier without spending an extra penny on hosting.

You shouldn’t use Lighthouse as your only measure of your site’s performance. But it’s still reassuring when you get to see those fireworks!

Don’t start with the hard way. Exhaust all the easy solutions – or at least, make a conscious effort which easy solutions to enact or reject – first. Only if you really want to get into the weeds should you actually try doing the things I propose here. They’re not for most sites, and they’re not the for faint of heart.

Performance is a tradeoff. Every performance improvement costs you something else: time, money, DX, UX, etc. What you choose to trade for performance gains depends on your priority of constituencies, which may differ from mine.4

This is not a recipe book. This won’t tell you what code to change or what commands to run. The right answers for your content will be different than the right answers for mine. Also: you shouldn’t change what you don’t understand! But I hope these tips will help you think about what questions you need to ask to make your site blazing fast.

Okay, let’s get started…

1. Backstab the plugins you can’t live without

If there are plugins you can’t remove because you depend upon their functionality, and those plugins inject content (especially JavaScript) on the front-end… backstab them to undermine that functionality.

For example, if you want Jetpack‘s backup and downtime monitoring features, but you don’t want it injecting random <link rel='stylesheet' id='...-jetpack-css' href='...' media='all' />‘s (an extra stylesheet to download and parse) into your pages: find the add_filter hook it uses and remove_filter it in your theme5.

Screenshot of a code editor showing a typical WordPress theme's header.php, but with the wp_head() line commented out.
Alternatively, entirely remove the wp_head() and manually reimplement the functionality you actually need. Insert your own joke about “Headless WordPress” here.

Better yet, remove wp_head() from your theme entirely6. Now, instead of blocking the hooks you don’t want polluting your <head>, you’re specifically allowing only those you want. You’ll want to take care to get some semi-essential ones like <link rel="canonical" href="...">7.

Now most of your plugins are broken, but in exchange, your theme has reclaimed complete control over what gets sent to the user. You can select what content you actually want delivered, and deliver no more than that. It’s harder work for you, but your site becomes so much lighter.

Animated GIF from The Simpsons. Leonard Nimoy says "Well, my work is done here." Barney says "What do you mean? You didn't do anything?" Nimoy laughs and replies "Didn't I?" before disappearing as if transported away by a Star Trek teleporter.
Your site is faster now. It doesn’t work, but it’s quick about it!

2. Throw away 100% of your render-blocking JavaScript (and as much as you can of the rest)

The single biggest bottleneck to the user viewing a modern WordPress website is the JavaScript that needs to be downloaded, compiled, and executed before the page can be rendered. Most of that’s plugins, but even on a nearly-vanilla installation you might find a copy of jQuery (eww!) and some other files.

In step 1 you threw it all away, which is great… but I’m betting you were depending on some of that to make your site work? Let’s put it back, carefully and selectively, while minimising the impact on load time.

That means scripts should be loaded (a) low-down, and/or (b) marked defer (or, better yet, async), so they don’t block page rendering.

If you haven’t already, you might like to View Source on this page. Count my <script> tags. You’ll probably find just two of them: one external file marked async, and a second block right at the bottom.

Screenshot showing source code of <script> tags on danq.me. There's one <script async> that loads instant.js, and an inline script with three sections: one that adds Web Share API functionality, one that manages VR360 images, and one that loads a service worker.
The only third-party script routinely loaded on danq.me is Instant.Page, which specifically exists to improve perceived performance. It preloads links when you hover over or start-to-touch them.

The inline <script> in my footer.php wraps a single line of PHP: which looks a little like this: <?php echo implode("\n\n", apply_filters( 'danq_footer_js', [] ) ); ?>. For each item in an initially-empty array, it appends to the script tag. When I render anything that requires JavaScript, e.g. for 360° photography, I can just add to that (keyed, to prevent duplicates when viewing an archive page) array. Thus, the relevant script gets added exclusively to the pages where it’s needed, not to the entire site.

The only inline script added to every page loads my service worker, which itself aims to optimise caching as well as providing limited “offline” functionality.

While you’re tweaking your JavaScript anyway, you might like to check that any suitable addEventListeners are set to passive mode. Especially if you’re doing anything with touch or mousewheel events, you can often increase the perceived performance of these interactions by not letting your custom code block the default browser behaviour.

I promise you; most of your blog’s front-end JavaScript is either (a) garbage nobody wants, (b) polyfills for platforms nobody uses, or (c) huge libraries you’ve imported so you can use just one or two functions form them. Trash them.

3. Don’t use a CDN

Wait, what? That’s the opposite of what everybody else recommends. To understand why, you have to think about why people recommend a CDN in the first place. Their reasons are usually threefold:

  1. Proximity
    Claim:
    A CDN delivers content geographically-closer to the user.
    Retort:
    Often true. But in step 4 we’re going to make sure that everything critical comes within the first TCP sliding window anyway, so there’s little benefit, and there’s a cost to that extra DNS lookup and fresh handshake. Edge caching your own content may have value, but for most sites it’ll have a much smaller impact than almost everything else on this list.
  2. Precaching
    Claim:
    A CDN improves the chance resources are precached in the user’s browser.
    Retort: Possibly true, especially with fonts (although see step 6) but less than you’d think with JS libraries because there are so many different versions/hosts of each. Yours may well be the only site in the user’s circuit that uses a particular one!
  3. Power
    Claim: A CDN has more resources than you and so can better-withstand spikes of traffic.
    Retort: Maybe, but they also introduce an additional single-point-of-failure. CDNs aren’t magically immune to downtime nor content-blocking, and if you depend on one you’ve just doubled the number of potential failure points that can make your site instantly useless. Furthermore: in exchange for those resources you’re trading away your users’ privacy and security: if a CDN gets hacked, every site that uses it gets hacked too.

Consider edge-caching your own content only if you think you need it, but ditch jsDeliver, cdnjs, Google Hosted Libraries etc.

Screenshot showing a waterfall representation of downloading and rendering the danq.me homepage. DCL (Dom Content Loaded) occurs at 20.62ms.
Despite having no edge cache and being hosted in a different country to me, I can open a completely fresh browser and reach DOMContentLoaded on the my homepage in ~20ms. You should learn how to read a waterfall performance chart just so you can enjoy how “flat” mine is.

Hell: if you can, ditch all JavaScript served from third-parties and slap a Content-Security-Policy: script-src 'self' header on your domain to dramatically reduce the entire attack surface of your site!8

4. Reduce your HTML and CSS size to <12kb compressed

There’s a magic number you need to know: 12kb. Because of some complicated but fascinating maths (and depending on how your hosting is configured), it can be significantly faster to initially load a web resource of up to 12kb than it is to load one of, say, 15kb. Also, for the same reason, loading a web resource of much less than 12kb might not be significantly faster than loading one only a little less than 12kb.

Exploit this by:

  1. Making your pages as light as possible9, then
  2. Inlining as much essential content as possible (CSS, SVGs, JavaScript etc.) to bring you back up to close-to that magic number again!
$ curl --compressed -so /dev/null -w "%{size_download}\n" https://danq.me/
10416
Note that this is the compressed, over-the-wire size. Last I checked, my homepage weighed-in at about 10.4kb compressed, which includes the entirety of its HTML and CSS, most of its JS, and a couple of its SVG images.

Again, this probably flies in the face of everything you were taught about performance. I’m sure you were told that you should <link> to your stylesheets so that they can be cached across page loads. But it turns out that if you can make your HTML and CSS small enough, the opposite is true and you should inline the stylesheet again: caching styles becomes almost irrelevant if you get all the content in a single round-trip anyway!

For extra credit, consider optimising your homepage’s CSS so it’s even smaller by excluding directives that only apply to non-homepage pages, and vice-versa. Assuming you’re using a preprocessor, this shouldn’t be too hard: at simplest, you can have a homepage.css and main.css, each derived from a set of source files some of which they share (reset/normalisation, typography, colours, whatever) and the rest which is specific only to that part of the site.

Most web pages should fit entirely onto a floppy disk. This one doesn’t, mostly because of all the Simpsons clips, but most should.

Can’t manage to get your HTML and CSS down below the magic number? Then at least ensure that your HTML alone weighs in at <12kb compressed and you’ll still get some of the benefits. If you’ve got the headroom, you can selectively include a <style> block containing only the most-crucial CSS, with a particular focus on any that results in layout shifts (e.g. anything that specifies the height: of otherwise dynamically-sized block elements, or that declares an element position: absolute or position: fixed). These kinds of changes are relatively computationally-expensive because they cause content to re-flow, so provide hints as soon as possible so that the browser can accommodate for them.

5. Make the first load awesome

We don’t really talk about content being “above the fold” like we used to, because the modern Web has such a diverse array of screen sizes and resolutions that doing so doesn’t make much sense.

But if loading your full page is still going to take multiple HTTP requests (scripts, images, fonts, whatever), you should still try to deliver the maximum possible value in the first round-trip. That means:

  • Making sure all your textual content loads immediately! Unless you’re delivering a huge amount of text, there’s absolutely no excuse for lazy-loading text: it’s usually tiny, compresses well, and it’s fast to parse. It’s also the most-important content of most pages. Get it delivered to the browser so it can be rendered rightaway.
  • Lazy-loading images that are “expected” to be below the fold, using the proper HTML mechanism for this (never a JavaScript approach).
  • Reserving space for blocks by sizing images appropriately, e.g. using <img width="..." height="..." ...> or having them load as a background with background-size: cover or contain in a block sized with CSS delivered in the initial payload. This reduces layout shift, which mitigates the need for computationally-expensive content reflows.
  • If possible (see point 4), move vector images that support basic site functionality, like logos, inline. This might also apply to icons, if they’re “as important” as text content.
  • Marking everything up with standard semantic HTML. There’s a trend for component-driven design to go much too far, resulting in JavaScript components being used in place of standard elements like links, buttons, and images, resulting in highly-fragile websites: when those scripts fail (or are very slow to load), the page becomes unusable.
Screenshot of danq.me's homepage with all external resources and all CSS disabled. It's plain-text, but it's still entirely useable.
If you want to be sure you’re prioritising your content first and foremost, try disabling all CSS, JavaScript, and external resources (or just access your site in a browser that ignores those things, like Lynx), and check that it’s still usable. As a bonus, this helps you check for several accessibility issues.

6. Reduce your dependence on downloaded fonts

Fonts are lovely and can be an important part of your brand identity, but they can also add a lot of weight to your web pages.

If you’re ready and able to drop your webfonts and appreciate the beauty and flexibility of a system font stack (I get it: I’m not there quite yet!), you can at least make smarter use of your fonts:

  • Every modern browser supports WOFF2, so you can ditch those chunky old formats you’re clinging onto.
  • If you’re only using the Latin alphabet, minify your fonts further by dropping the characters you don’t need: tools like Google Webfonts Helper can help with this, as well as making it easier to selfhost fonts from the most-popular library (is a smart idea for the reasons described under point 3, above!). There are tools available to further minify fonts if e.g. you only need the capital letters for your title font or something.
      • Browsers are pretty clever and will work-around it if you make a mistake. Didn’t include an emoji or some obscure mathematical symbol, and then accidentally used them in a post? Browsers will switch to a system font that can fill in the gap, for you.
  • Make the most-liberal use of the font-display: CSS directive that you can tolerate!
    • Don’t use font-display: block, which is functionally the default in most browsers, unless you absolutely have to.
    • font-display: fallback is good if you’re too cowardly/think your font is too important for you to try font-display: optional.
    • font-display: optional is an excellent choice for body text: if the browser thinks it’s worthwhile to download the font (it might choose not to if the operating system indicates that it’s using a metered or low-bandwidth connection, for example), it’ll try to download it, but it won’t let doing so slow things down too much and it’ll fall-back to whatever backup (system) font you specify.
    • font-display: swap is also worth considering: this will render any text immediately, even if the right font hasn’t downloaded yet, with no blocking time whatsoever, and then swap it for the right font when it appears. It’s probably better for headings, because large paragraphs of text can be a little disorienting if they change font while a user is looking at them!
If writing is for nerds, then typography must be doubly-so. But you’ve read this far, so I’m confident that you qualify…

7. Cache pre-compressed static files

It’s possible that by this point you’re saying “if I had to do this much work, I might as well just use a static site generator”. Well good news: that’s what you’re about to do!

Obviously you should make sure all your regular caching improvements (appropriate HTTP headers for caching, a service worker that further improves on that logic based on your content’s update schedule, etc.) first. Again: everything in this guide presupposes that you’ve already done the things that normal people do.

By aggressively caching pre-compressed copies of all your pages, you’re effectively getting the best of both worlds: a website that, for anonymous visitors, is served directly from .html.gz files on a hard disk or even straight from RAM in memcached10, but which still maintains all the necessary server-side interactivity to allow it to be used as a conventional Web-based CMS (including accepting comments if that’s your jam).

WP Super Cache can do the heavy lifting for you for a filesystem-based solution so long as you put it into “Expert” mode and amending your webserver configuration. I’m using Nginx, so I needed a try_files directive like this:

location / {
  try_files /wp-content/cache/supercache/$http_host/$wp_super_cache_path/index-https.html $uri $uri/ /index.php?$args;
}

8. Optimise image formats

I’m sure your favourite performance testing tool has already complained at you about your failure to use the best formats possible when serving images to your users. But how can you fix it?

There are some great plugins for improving your images automatically and/or in bulk – I use EWWW Image Optimizer – but to really make the most of them you’ll want to reconfigure your webserver to detect clients that Accept: image/webp and attempt to dynamically serve them .webp variants, for example. Or if you’re ready to give up on legacy formats and replace all your .pngs with .webps, that’s probably fine too!

Image containing the text
The image you see at https://danq.me/_q23u/2023/11/dynamic.png is probably an image/webp. But if your browser doesn’t support WebP, you’ll get an image/png instead!

Assuming you’ve got curl and Imagemagick‘s identify, you can see this in action:

  • curl -s https://danq.me/_q23u/2023/11/dynamic.png -H "Accept: image/webp" | identify -
    (Will give you a WebP image)
  • curl -s https://danq.me/_q23u/2023/11/dynamic.png -H "Accept: image/png" | identify -
    (Will give you a PNG image, even though the URL is the same)

9. Simplify, simplify, simplify

The single biggest impact you can have upon the performance of your WordPress pages is to make them less complex.

I’m not necessarily saying that everybody should follow in my lead and co-publish their WordPress sites on the Gemini protocol. But you’ve got to admit: the simplicity of the Gemini protocol and the associated Gemtext format makes both lightning fast.

Screenshot showing this blog post as viewed via Gemini, in the Lagrange browser.
You don’t have to go as light as Gemtext – like this page on Gemini does – to see benefits.

Writing my templates and posts so that they’re compatible with CapsulePress helps keep my code necessarily-simple. You don’t have to do that, though, but you should be asking yourself:

  • Does my DOM need to cascade so deeply? Could I achieve the same with less?
  • Am I pre-emptively creating content, e.g. adding a hidden <dialog> directly to the markup in the anticipation that it might be triggered later using JavaScript, rather than having that JavaScript run document.createElement the element after the page becomes readable?
  • Have I created unnecessarily-long chains of CSS selectors11 when what I really want is a simple class name, or perhaps even a semantic element name?

10. Add a Service Worker

A service worker isn’t magic. In particular, it can’t help you with those new visitors hitting your site for the first time12.

A service worker lets you do smart things on behalf of the user’s network connection, so that by the time they ask for a resource, you already fetched it for them.

But a suitable service worker can do a few things that can help with performance. In particular, you might consider:

  • Precaching assets that you anticipate they’re likely to need (e.g. if you use different stylesheets for the homepage and other pages, you can preload both so no matter where a user lands they’ve already got the CSS they’ll need for the entire site).
  • Preloading popular pages like the homepage and recent articles, allowing them to load quickly.
  • Caching a fallback pages – and other resources as-they’re-accessed – to support a full experience for users even if they (or your site!) disconnect from the Internet (or even embedding “save for offline” functionality!).

Chapters 7 and 8 of Going Offline by Jeremy Keith are especially good for explaining how this can be achieved, and it’s all much easier than everything else I just described.

Anything else?

Did I miss anything? If you’ve got a tip about ramping up WordPress performance that isn’t one of the “typical seven” – probably because it’s too hard to be worthwhile for most people – I’d love to hear it!

Footnotes

1 You’ll sometimes see guides that suggest that using a CDN is to be recommended specifically because it splits your assets among multiple domains/subdomains, which mitigates browsers’ limitation on the number of files they can download simultaneously. This is terrible advice, because such limitations essentially don’t exist any more, but DNS lookups and TLS handshakes still have a bandwidth and computational cost. There are good things about CDNs, sometimes, but this has not been one of them for some time now.

2 I’m not sure why guides keep stressing the importance of minifying code, because by the time you’re compressing them too it’s almost pointless. I guess it’s helpful if your compression fails?

3 “Use a faster server” is a “just throw money/the environment at it” solution. I’d like to think we can do better.

4 For my personal blog, I choose to prioritise user experience, privacy, accessibility, resilience, and standards compliance above almost everything else.

5 If you prefer to keep your backstab code separate, you can put it in a custom plugin, but you might find that you have to name it something late in the alphabet – I’ve previously used names like zzz-danq-anti-plugin-hacks – to ensure that they load after the plugins whose functionality you intend to unhook: broadly-speaking, WordPress loads plugins in alphabetical order.

6 I’ve assumed you’re using a classic, not block, theme. If you’re using a block theme, you get a whole different set of performance challenges to think about. Don’t get me wrong: I love block themes and think they’re a great way to put more people in control of their site’s design! But if you’re at the point where you’re comfortable digging this deep into your site’s PHP code, you probably don’t need that feature anyway, right?

7 WordPress is really good at serving functionally-duplicate content, so search engines appreciate it if you declare a proper canonical URL.

8 Before you choose to block all third-party JavaScript, you might have to whitelist Google Analytics if you’re the kind of person who doesn’t mind selling their visitor data to the world’s biggest harvester of personal information in exchange for some pretty graphs. I’m not that kind of person.

9 You were looking to join me in 512kb club anyway, right?

10 I’ve experimented with mounting a ramdisk and storing the WP Super Cache directory there, but it didn’t make a huge difference, probably because my files are so small that the parse/render time on the browser side dominates the total cascade, and they’re already being served from an SSD. I imagine in my case memcached would provide similarly-small benefits.

11 I really love the power of CSS preprocessors like Sass, but they do make it deceptively easy to create many more – and longer – selectors than you intended in your final compiled stylesheet.

12 Tools like Lighthouse usually simulate first-time visitors, which can be a little unfair to sites with great performance for established visitors. But everybody is a first-time visitor at least once (and probably more times, as caches expire or are cleared), so they’re still a metric you should consider.

Screenshot of a code editor showing a typical WordPress theme's header.php, but with the wp_head() line commented out.× Animated GIF from The Simpsons. Leonard Nimoy says "Well, my work is done here." Barney says "What do you mean? You didn't do anything?" Nimoy laughs and replies "Didn't I?" before disappearing as if transported away by a Star Trek teleporter.× Screenshot showing source code of <script> tags on danq.me. There's one <script async> that loads instant.js, and an inline script with three sections: one that adds Web Share API functionality, one that manages VR360 images, and one that loads a service worker.× Screenshot showing a waterfall representation of downloading and rendering the danq.me homepage. DCL (Dom Content Loaded) occurs at 20.62ms.× Screenshot of danq.me's homepage with all external resources and all CSS disabled. It's plain-text, but it's still entirely useable.× Screenshot showing this blog post as viewed via Gemini, in the Lagrange browser.×

.well-known/feeds

<link rel="alternate" type="application/rss+xml"> is fine, but I feel like there should be a standard for a site, not a page, to share a “list of feeds associated with a site”.

Last night, I dreamed about a way to achieve that: ./well-known/feeds as an OPML document. Here’s mine, and here’s a draft spec.

Interested to hear what Dave Winer thinks…

Short-Term Blogging

There’s a perception that a blog is a long-lived, ongoing thing. That it lives with and alongside its author.1

But that doesn’t have to be true, and I think a lot of people could benefit from “short-term” blogging. Consider:

  • Photoblogging your holiday, rather than posting snaps to social media
    You gain the ability to add context, crosslinking, and have permanent addresses (rather than losing eveything to the depths of a feed). You can crosspost/syndicate to your favourite socials if that’s your poison..
Photo showing a mobile phone, held in a hand, being used to take a photograph of a rugged coastline landscape.
Photoblog your holiday and I might follow it, and I’ll do so at my convenience. Put your snaps on Facebook and I almost certainly won’t bother. Photo courtesy ArtHouse Studio.
  • Blogging your studies, rather than keeping your notes to yourself
    Writing what you learn helps you remember it; writing what you learn in a public space helps others learn too and makes it easy to search for your discoveries later.2
  • Recording your roleplaying, rather than just summarising each session to your fellow players
    My D&D group does this at levellers.blog! That site won’t continue to be updated forever – the party will someday retire or, more-likely, come to a glorious but horrific end – but it’ll always live on as a reminder of what we achieved.

One of my favourite examples of such a blog was 52 Reflect3 (now integrated into its successor The Improbable Blog). For 52 consecutive weeks my partner‘s brother Robin blogged about adventures that took him out of his home in London and it was amazing. The project’s finished, but a blog was absolutely the right medium for it because now it’s got a “forever home” on the Web (imagine if he’d posted instead to Twitter, only for that platform to turn into a flaming turd).

I don’t often shill for my employer, but I genuinely believe that the free tier on WordPress.com is an excellent way to give a forever home to your short-term blog4. Did you know that you can type new.blog (or blog.new; both work!) into your browser to start one?

What are you going to write about?

Footnotes

1 This blog is, of course, an example of a long-term blog. It’s been going in some form or another for over half my life, and I don’t see that changing. But it’s not the only kind of blog.

2 Personally, I really love the serendipity of asking a web search engine for the solution to a problem and finding a result that turns out to be something that I myself wrote, long ago!

3 My previous posts about 52 Reflect: Challenge Robin, Twatt, Brixton to Brighton by Boris Bike, Ending on a High (and associated photo/note)

4 One of my favourite features of WordPress.com is the fact that it’s built atop the world’s most-popular blogging software and you can export all your data at any time, so there’s absolutely no lock-in: if you want to migrate to a competitor or even host your own blog, it’s really easy to do so!

Photo showing a mobile phone, held in a hand, being used to take a photograph of a rugged coastline landscape.×

How a 2002 standard made 2022 bearable

This is an alternate history of the Web. The premise is true, but the story diverges from our timeline and looks at an alternative “Web that might have been”.

Prehistory

This is the story of P3P, one of the greatest Web standards whose history has been forgotten1, and how the abject failure of its first versions paved the way for its bright future decades later. But I’m getting ahead of myself…

Drafted in 2002 in the wake of growing concern about the death of privacy on the Internet, P3P 1.0 aimed to make the collection of personally-identifiable data online transparent. Hurrah, right?

Not so much. Its immediate impact was lukewarm to negative: developers couldn’t understand why their cookies were no longer being accepted by Internet Explorer 6, the first browser to implement the standard, and the whole exercise was slated as providing a false sense of security, not stopping actual bad guys, and an attempt to apply a technical solution to a political problem.2

Flowchart showing the negotiation process between a user, browser, and server as the user browses an ecommerce site. The homepage's P3P policy states that it collects IP addresses, which is compatible with the user's preferences. Later, at checkout, the P3P policy states that the user's address will be collected and shared with a courier. The collection is fine according to the user's preferences, but she's asked to be notified if it'll be shared, so the browser notifies the user. The user approves of the policy and asks that this approval is remembered for this site, and the checkout process continues.
Initially, the principle was sound. The specification was weak. The implementation was apalling. But P3P 1.1 could have worked well.

Developers are lazy3 and soon converged on the simplest possible solution: add a garbage HTTP header like P3P: CP="See our website for our privacy policy." and your cookies work just fine! Ignore the problem, ignore the proposed solution, just do what gets the project shipped.

Without any meaningful enforcement it also perfectly feasible to, y’know, just lie about how well you treat user data. Seeing the way the wind was blowing, Mozilla dropped support for P3P, and Microsoft’s support – which had always been half-baked and lacked even the most basic user-facing controls or customisation options – languished in obscurity.

For a while, it seemed like P3P was dying. Maybe, in some alternate timeline, it did die: vanishing into nothing like VRML, WAP, and XBAP.

But fortunately for us, we don’t live in that timeline.

Revival

In 2009, the European Union revisited the Privacy and Electronic Communications Directive. The initial regulations, published in 2002, required that Web users be able to opt-out of tracking cookies, but the amendment required that sites ensure that users opted-in.

As-written, this confusing new regulation posed an immediate problem: if a user clicked the button to say “no, I don’t want cookies”, and you didn’t want to ask for their consent again on every page load… you had to give them a cookie (or use some other technique legally-indistinguishable from cookies). Now you’re stuck in an endless cookie-circle.4

This, and other factors of informed consent, quickly introduced a new pattern among those websites that were fastest to react to the legislative change:

Screenshot from how-i-experience-web-today.com showing an article mostly-covered by a cookie privacy statement and configuration options, utilising dark patterns to try to discourage users from opting-out of cookies.
The cookie consent banner, with all its confusing language and dark patterns, looked like it was going to become the new normal for web users in the early 2010s. But thankfully, our saviour had been waiting in the wings all along.

Web users rebelled. These ugly overlays felt like a regresssion to a time when popup ads and splash pages were commonplace. “If only,” people cried out, “There were a better way to do this!”

It was Professor Lorie Cranor, one of the original authors of the underloved P3P specification and a respected champion of usable privacy and security, whose rallying cry gave us hope. Her CNET article, “Why the EU Cookie Directive is a solved problem”5, inspired a new generation of development on what would become known as P3P 2.0.

While maintaining backwards compatibility, this new standard:

  • deprecated those horrible XML documents in favour of HTTP headers and <link> tags alone,
  • removing support for Set-Cookie2: headers, which nobody used anyway, and
  • added features by which the provenance and purpose of cookies could be stated in a way that dramatically simplified adoption in browsers

Internet Explorer at this point was still used by a majority of Web users. It still supported the older version of the standard, and – as perhaps the greatest gift that the much-maligned browser ever gave us – provided a reference implementation as well as a stepping-stone to wider adoption.

Opera, then Firefox, then “new kid” Chrome each adopted P3P 2.0; Microsoft finally got on board with IE 8 SP 1. Now the latest versions of all the mainstream browsers had a solid implementation6 well before the European data protection regulators began fining companies that misused tracking cookies.

Fabricated screenshot from Microsoft Edge, browsing 3r.org.uk: a "privacy" icon in the address bar has been clicked, and the resulting menu says: About 3r.org.uk. Connection is secure (with link for more info). Privacy and Cookies (with link for more info). Cookies (3 cookies in use) - Strictly necessary (2 in use), dropdown menu set to "Default (accept, delete later)"; Optional (1 in use), dropdown menu set to "Accept for this site". Checkbox for "Treat third-party cookies differently?", unchecked. Privacy (link to full policy): Legitimate interest - this site collects username, IP address, technical logs...; Consenmt - this site collects email address, phone number... Button to manage content. Button to "Exercise data rights".
Nowadays, we’ve pretty-well standardised on the address bar being the place where all cookie and privacy information and settings are stored. Can you imagine if things had gone any other way?

But where the story of P3P‘s successes shine brightest came in 2016, with the passing of the GDPR. The W3C realised that P3P could simplify both the expression and understanding of privacy policies for users, and formed a group to work on version 2.1. And that’s the version you use today.

When you launch a new service, you probably use one of the many free wizard-driven tools to express your privacy policy and the bases for your data processing, and it spits out a template privacy policy. You need the human-readable version, of course, since the 2020 German court ruling that you cannot rely on a machine-readable privacy policy alone, but the real gem is the P3P: 2.1 header version.

Assuming you don’t have any unusual quirks in your data processing (ask your lawyer!), you can just paste the relevant code into your server configuration and you’re good to go. Site users get a warning if their personal data preferences conflict with your data policies, and can choose how to act: not using your service, choosing which of your features to opt-in or out- of, or – hopefully! – granting an exception to your site (possibly with caveats, such as sandboxing your cookies or clearing them immediately after closing the browser tab).

Sure, what we’ve got isn’t perfect. Sometimes companies outright lie about their use of information or use illicit methods to track user behaviour. There’ll always be bad guys out there. That’s what laws are there to deal with.

But what we’ve got today is so seamless, it’s hard to imagine a world in which we somehow all… collectively decided that the correct solution to the privacy problem might have been to throw endless popovers into users’ faces, bury consent-based choices under dark patterns, and make humans do the work that should from the outset have been done by machines. What a strange and terrible timeline that would have been.

Footnotes

1 If you know P3P‘s history, regardless of what timeline you’re in: congratulations! You win One Internet Point.

2 Techbros have been trying to solve political problems using technology since long before the word “techbro” was used in its current context. See also: (a) there aren’t enough mental health professionals, let’s make an AI app? (b) we don’t have enough ventilators for this pandemic, let’s 3D print air pumps? (c) banks keep failing, let’s make a cryptocurrency? (d) we need less carbon in the atmosphere or we’re going to go extinct, better hope direct carbon capture tech pans out eh? (e) we have any problem at all, lets somehow shoehorn blockchain into some far-fetched idea about how to solve it without me having to get out of my chair why not?

3 Note to self: find a citation for this when you can be bothered.

4 I can’t decide whether “endless cookie circle” is the name of the New Wave band I want to form, or a description of the way I want to eventually die. Perhaps both.

5 Link missing. Did I jump timelines?

6 Implementation details varied, but that’s part of the joy of the Web. Firefox favoured “conservative” defaults; Chrome and IE had “permissive” ones; and Opera provided an ultra-configrable matrix of options by which a user could specify exactly which kinds of cookies to accept, linked to which kinds of personal data, from which sites, all somehow backed by an extended regular expression parser that was only truly understood by three people, two of whom were Opera developers.

Flowchart showing the negotiation process between a user, browser, and server as the user browses an ecommerce site. The homepage's P3P policy states that it collects IP addresses, which is compatible with the user's preferences. Later, at checkout, the P3P policy states that the user's address will be collected and shared with a courier. The collection is fine according to the user's preferences, but she's asked to be notified if it'll be shared, so the browser notifies the user. The user approves of the policy and asks that this approval is remembered for this site, and the checkout process continues.× Screenshot from how-i-experience-web-today.com showing an article mostly-covered by a cookie privacy statement and configuration options, utilising dark patterns to try to discourage users from opting-out of cookies.× Fabricated screenshot from Microsoft Edge, browsing 3r.org.uk: a "privacy" icon in the address bar has been clicked, and the resulting menu says: About 3r.org.uk. Connection is secure (with link for more info). Privacy and Cookies (with link for more info). Cookies (3 cookies in use) - Strictly necessary (2 in use), dropdown menu set to "Default (accept, delete later)"; Optional (1 in use), dropdown menu set to "Accept for this site". Checkbox for "Treat third-party cookies differently?", unchecked. Privacy (link to full policy): Legitimate interest - this site collects username, IP address, technical logs...; Consenmt - this site collects email address, phone number... Button to manage content. Button to "Exercise data rights".×

The Page With No Code

It all started when I saw no-ht.ml, Terence Eden‘s hilarious response to Salma Alam-Naylor‘s excellent HTML is all you need to make a website. The latter is an argument against both the silly amount of JavaScript with which websites routinely burden their users, but also even against depending on CSS. As a fan of CSS Naked Day and a firm believer in using JS only for progressive enhancement, I’m obviously in favour.

Screenshot showing Terence Eden's no-ht.ml website, which uses plain text ASCII/Unicode art to argue that you don't need HTML.
Obviously no-ht.ml is to be taken as tongue-in-cheek, but as you’re about to see: it caught my interest and got me thinking: how could I go even further.

Terence’s site works by delivering a document with a claimed MIME type of text/html, but which contains only the (invalid) “HTML” code <!doctype UNICODE><meta charset="UTF-8"><plaintext> (to work around browsers’ wish to treat the page as HTML). This is followed by a block of UTF-8 plain text making use of spacing and emoji to illustrate and decorate the content. It’s frankly very silly, and I love it.1

I think it’s possible to go one step further, though, and create a web page with no code whatsoever. That is, one that you can read as if it were a regular web page, but where using View Source or e.g. downloading the page with curl will show you… nothing.

I present: The Page With No Code! (It’ll probably only work if you’re using Firefox, for reasons that will become apparent later.)

Screenshot showing my webpage, "The Page With No Code". Using white text (and some emojis) on a blue gradient background, it describes the same thought process as I describe in this blog post, and invites the reader to "View Source" and see that the page genuinely does appear to have no code.
I’d encourage you to visit The Page With No Code, use View Source to confirm for yourself that it truly has no code, and see if you can work out for yourself how it manages this feat… before coming back here for an explanation. Again: probably Firefox-only.

Once you’ve had a look for yourself and had a chance to form an opinion, here’s an explanation of the black magic that makes this atrocity possible:

  1. The page is blank. It’s delivered with Content-Type: text/html. Your browser interprets a completely-blank page as faulty and corrects it to a functionally-blank minimal HTML page: <html><head></head><body></body></html>.
  2. <body> and <html> elements can be styled with CSS; this includes the ability to add content: ::before and ::after each element. If only we could load a stylesheet then content injection is possible.
  3. We use the fourth way to inject CSS – a Link: HTTP header – to deliver a CSS payload (this, unfortunately, only works in Firefox). To further obfuscate what’s happening and remove the need for a round-trip, this is encoded as a data: URI.
Screenshot showing HTTP headers returned from a request to the No Code Webpage. A Link: header is highlighted, it contains a data: URL with a base64-encoded CSS stylesheet.
The stylesheet – and all the page content – is right there in the Link: header if you just care to decode it! Observe that while 5.84kB of data are transferred, the browser rightly states that the page is zero bytes in size.

This is one of the most disgusting things I’ve ever coded, and that’s saying a lot. I’m so proud of myself. You can view the code I used to generate this awful thing on Github.

My server-side implementation of this broke in 2023 after I upgraded Nginx; my new version doesn’t support the super-long Link: header needed to make this hack work, so I’ve updated the page to use the Link: to reference the CSS file rather than embed it via a data URI. It’s not as cool, but it at least means you can still see the page. Thanks to Thomas Bradshaw for pointing out the problem.

Footnotes

1 My first reaction was “why not just deliver something with Content-Type: text/plain; charset=utf-8 and dispense with the invalid code, but perhaps that’s just me overthinking the non-existent problem.

Reply to The ethics of syndicating comments using WebMentions

In his blog post “The ethics of syndicating comments using WebMentions”, Terence Eden said:

I want to see what people are writing in public about my posts. I also want to direct people to the conversations which are happening elsewhere on the web. But people – quite rightly – might not want their content permanently stored by my site.

So I think I have a few options.

  1. Do nothing. My site; my rules. If you don’t want me to grab your hot takes, don’t post them in public. (Feels a bit rude, TBQH.)
  2. Be reactive. If someone asks me to remove their content, do so. (But, of course, how will they know I’ve made a copy?)
  3. Stop syndicating comments. (I don’t wanna!)
  4. Replace the verbatim comments with a link saying “Fred mentioned this article on Twitter” . (A bit of a disruptive experience for readers.)
  5. Use oEmbed to capture the user’s comment and dynamically load it from the 3rd party site. That would update automatically if the user changes their name or deleted the comment. (A massive faff to set up.)

Terence describes a problem that I’ve wrestled with myself. If somebody comments directly on my blog using the form at the bottom of a post, that’s a pretty strong indicator of them giving their consent for their comment to be published at the bottom of that post (at my discretion). If somebody publicly replies somewhere my post is syndicated, that’s less-obvious, but still pretty clear. If somebody merely mentions my post publicly, writing their own post and linking to mine… that’s a real fuzzy area.

I take a minimal approach; only capturing their full content if it’s short and otherwise trying to extract a snippet that contains the bit that mentioned my content, and I think that works great. But Terence points out an important follow-up: what if the commenter deletes that content?

My approach so far has always been a reactive one – the second in Terence’s list – and I think it’s a morally-acceptable stance for a personal blogger. But I’m not sure it scales. I find myself asking: what if a news outlet did this, taking my self-published feedback to their story and publishing it on their site, even if I later amended, retracted, or deleted it on my own? If somebody’s making money out of my content, that feels different: I’ve always been clear that what I write on my blog is permissively-licensed, but that permissiveness is based on the prohibition of commercial use of my content.

Perhaps down the line this can be solved technologically: something machine-readable akin to the <link rel="license" ...> tag could state an author’s preference for how their content is syndicated by third parties they’ve mentioned, answering questions like:

  • Can you quote me, or just link to me? Who do these rules apply to? (Should we be attaching metadata to individual links?)
  • Should you inform me that you’ve done so, and if so: how (WebMention, etc.)?
  • If you (or your site) observe that my content has disappeared or changed for an extended time, should that be taken as revokation of consent to syndicate it?

Right now, the relevant technologies are not well-established enough to even begin this kind of work, but if a modern interconected federated web of personal websites takes off, it’s the kind of question we might one day have to answer.

For now my gut feeling is that option #2 (reactive moderation of syndicated comments) is ethically-sufficient for personal websites. But I’ll be watching the feedback Terence (who probably gets many more readers than I) receives in case my gut doesn’t represent the majority!

Breakups as HTTP Response Codes

103: Early Hints ("I'm not sure this can last forever.")
103: Early Hints (“I’m not sure this can last forever.”)
300: Multiple Choices ("There are so many ways I can do better than you.")
300: Multiple Choices (“There are so many ways I can do better than you.”)
303: See Other ("You should date other people.")
303: See Other (“You should date other people.”)
304: Not Modified ("With you, I feel like I'm stagnating.")
304: Not Modified (“With you, I feel like I’m stagnating.”)
402: Payment Required ("I am a prostitute.")
402: Payment Required (“I am a prostitute.”)
403: Forbidden ("You don't get this any more.")
403: Forbidden (“You don’t get this any more.”)
406: Not Acceptable ("I could never introduce you to my parents.")
406: Not Acceptable (“I could never introduce you to my parents.”)
408: Request Timeout ("You keep saying you'll propose but you never do.")
408: Request Timeout (“You keep saying you’ll propose but you never do.”)
409: Conflict ("We hate each other.")
409: Conflict (“We hate each other.”)
410: Gone (ghosted)
410: Gone (ghosted)
411: Length Required ("Your penis is too small.")
411: Length Required (“Your penis is too small.”)
413: Payload Too Large ("Your penis is too big.")
413: Payload Too Large (“Your penis is too big.”)
416: Range Not Satisfied ("Our sex life is boring and repretitive.")
416: Range Not Satisfied (“Our sex life is boring and repretitive.”)
425: Too Early ("Your premature ejaculation is a problem.")
425: Too Early (“Your premature ejaculation is a problem.”)
428: Precondition Failed ("You're still sleeping with your ex-!?")
428: Precondition Failed (“You’re still sleeping with your ex-!?”)
429: Too Many Requests ("You're so demanding!")
429: Too Many Requests (“You’re so demanding!”)
451: Unavailable for Legal Reasons ("I'm married to somebody else.")
451: Unavailable for Legal Reasons (“I’m married to somebody else.”)
502: Bad Gateway ("Your pussy is awful.")
502: Bad Gateway (“Your pussy is awful.”)
508: Loop Detected ("We just keep fighting.")
508: Loop Detected (“We just keep fighting.”)

With thanks to Ruth for the conversation that inspired these pictures, and apologies to the rest of the Internet for creating them.

103: Early Hints ("I'm not sure this can last forever.")× 300: Multiple Choices ("There are so many ways I can do better than you.")× 303: See Other ("You should date other people.")× 304: Not Modified ("With you, I feel like I'm stagnating.")× 402: Payment Required ("I am a prostitute.")× 403: Forbidden ("You don't get this any more.")× 406: Not Acceptable ("I could never introduce you to my parents.")× 408: Request Timeout ("You keep saying you'll propose but you never do.")× 409: Conflict ("We hate each other.")× 410: Gone (ghosted)× 411: Length Required ("Your penis is too small.")× 413: Payload Too Large ("Your penis is too big.")× 416: Range Not Satisfied ("Our sex life is boring and repretitive.")× 425: Too Early ("Your premature ejaculation is a problem.")× 428: Precondition Failed ("You're still sleeping with your ex-!?")× 429: Too Many Requests ("You're so demanding!")× 451: Unavailable for Legal Reasons ("I'm married to somebody else.")× 502: Bad Gateway ("Your pussy is awful.")× 508: Loop Detected ("We just keep fighting.")×

Sisyphus: The Board Game (Digital Edition)

I’m off work sick today: it’s just a cold, but it’s had a damn good go at wrecking my lungs and I feel pretty lousy. You know how when you’ve got too much of a brain-fog to trust yourself with production systems but you still want to write code (or is that just me?), so this morning I threw together a really, really stupid project which you can play online here.

Screenshot showing Sisyphus carrying a rock up a long numbered gameboard; he's on square 993 out of 1000, but (according to the rules printed below the board) he needs to land on 1000 exactly and never roll a double-1 or else he returns to the start.
It’s a board game. Well, the digital edition of one. Also, it’s not very good.

It’s inspired by a toot by Mason”Tailsteak” Williams (whom I’ve mentioned before once or twice). At first I thought I’d try to calculate the odds of winning at his proposed game, or how many times one might expect to play before winning, but I haven’t the brainpower for that in my snot-addled brain. So instead I threw together a terrible, terrible digital implementation.

Go play it if, like me, you’ve got nothing smarter that your brain can be doing today.

Spring ’83 Came And Went

Just in time for Robin Sloan to give up on Spring ’83, earlier this month I finally got aroud to launching STS-6 (named for the first mission of the Space Shuttle Challenger in Spring 1983), my experimental Spring ’83 server. It’s been a busy year; I had other things to do. But you might have guessed that something like this had been under my belt when I open-sourced a keygenerator for the protocol the other day.

If you’ve not played with Spring ’83, this post isn’t going to make much sense to you. Sorry.

Introducing STS-6

Screenshots showing STS-6, listing the most-recent blog posts on DanQ.me, in two different display styles.
My output looks distinctly different in The Kingswood Palimpsest then in The Oakland Follower-Sentinel (two key reference Spring ’83 clients), and that’s fine and expected.

My server is, as far as I can tell, very different from any others in a few key ways:

  • It does not allow third-party publishing at all. Some might argue that this undermines the aim of the exercise, but I disagree. My IndieWeb inclinations lead me to favour “self-hosted” content, shared from its owners’ domain. Also: the specification clearly states that a server must implement a denylist… I guess my denylist simply includes all keys that are not specifically permitted.
  • It’s geared towards dynamic content. My primary board self-publishes whenever I produce a new blog post, listing the most recent blog posts published. I have another half-implemented which shows a summary of the most-recent post, and another which would would simply use a WordPress page as its basis – yes, this was content management, but published over Spring ’83.
  • It provides helpers to streamline content production. It supports internal references to other boards you control using the format {{board:123}}which are automatically converted to addresses referencing the public key of the “current” keypair for that board. This separates the concept of a board and its content template from that board’s keypairs, making it easier to link to a board. To put it another way, STS-6 links are self-healing on the server-side (for local boards).
  • It helps automate content-fitting. Spring ’83 strictly requires a maximum board size of 2,217 bytes. STS-6 can be configured to fit a flexible amount of dynamic content within a template area while respecting that limit. For my posts list board, the number of posts shown is moderated by the size of the resulting board: STS-6 adds more and more links to the board until it’s too big, and then removes one!
  • It provides “hands-off” key management features. You can pregenerate a list of keys with different validity periods and the server will automatically cycle through them as necessary, implementing and retroactively-modifying <link rel="next"> connections to keep them current.

I’m sure that there are those who would see this as automating something that was beautiful because it was handcrafted; I don’t know whether or not I agree, but had Spring ’83 taken off in a bigger way, it would always only have been a matter of time before somebody tried my approach.

From a design perspective, I enjoyed optimising an SVG image of my header so it could meaningfully fit into the board. It’s pretty, and it’s tolerably lightweight.

If you want to see my server in action, patch this into your favourite Spring ’83 client: https://s83.danq.dev/10c3ff2e8336307b0ac7673b34737b242b80e8aa63ce4ccba182469ea83e0623

A dead end?

Without Robin’s active participation, I feel that Spring ’83 is probably coming to a dead end. It’s been a lot of fun to play with and I’d love to see what ideas the experience of it goes on to inspire next, but in its current form it’s one of those things that’s an interesting toy, but not something that’ll make serious waves.

In his last lab essay Robin already identified many of the key issues with the system (too complicated, no interpersonal-mentions, the challenge of keys-as-identifiers, etc.) and while they’re all solvable without breaking the underlying mechanisms (mentions might be handled by Webmention, perhaps, etc.), I understand the urge to take what was learned from this experiment and use it to help inform the decisions of the next one. Just as John Postel’s Quote of the Day protocol doesn’t see much use any more (although maybe if my finger server could support QotD?) but went on to inspire the direction of many subsequent “call-and-response” protocols, including HTTP, it’s okay if Spring ’83 disappears into obscurity, so long as we can learn what it did well and build upon that.

Meanwhile: if you’re looking for a hot new “like the web but lighter” protocol, you should probably check out Gemini. (Incidentally, you can find me at gemini://danq.me, but that’s something I’ll write about another day…)

Screenshots showing STS-6, listing the most-recent blog posts on DanQ.me, in two different display styles.×