The web loves data. Data about you. Data about who you are, about what you do, what you love doing, what you love eating.
…
I, on the other end, couldn’t care less about your data. I don’t run analytics on this website. I don’t care which articles you read, I don’t care if you read them. I don’t care about
which post is the most read or the most clicked. I don’t A/B test, I don’t try to overthink my content. I just don’t care.
…
Manu speaks my mind. Among the many hacks I’ve made to this site, I actively try not to invade on your privacy by collecting analytics, and I try not to let others to so
either!
My blog is for myself first and foremost (if you enjoy it too, that’s just a bonus). This leads to two conclusions:
If I’m the primary audience, I don’t need analytics (because I know who I am), and
I don’t want to be targeted by invasive analytics (and use browser extensions to block them, e.g. I by-default block all third-party scripts, delete cookies from non-allowlisted
domains 15 seconds after navigating away from sites, etc.); so I’d prefer them not to be on a site for which I’m the primary audience!
I’ve gone into more detail about this on my privacy page and hinted at it on my colophon. But I don’t know if anybody ever reads either
of those pages, of course!
The Internet is full of guides on easily making your WordPress installation run fast. If you’re looking to speed up your WordPress site, you should go read those, not this.
Those guides often boil down to the same old tips:
uninstall unnecessary plugins,
optimise caching (both on the server and, via your headers, on clients/proxies),
resize your images properly and/or ensure WordPress is doing this for you,
This article is for people who aren’t afraid to go tinkering in their WordPress codebase to squeeze a little extra (real world!) performance.
It’s for people whose neverending quest for perfection is already well beyond the point of diminishing returns.
But mostly, it’s for people who want to gawp at me, the freak who actually did this stuff just to make his personal blog a tiny bit nippier without spending an extra penny on
hosting.
You shouldn’t use Lighthouse as your only measure of your site’s performance. But it’s still reassuring when you get to see those fireworks!
Don’t start with the hard way. Exhaust all the easy solutions – or at least, make a conscious effort which easy solutions to enact or reject – first. Only if you really
want to get into the weeds should you actually try doing the things I propose here. They’re not for most sites, and they’re not the for faint of heart.
Performance is a tradeoff. Every performance improvement costs you something else: time, money, DX, UX, etc. What you choose to trade for performance gains depends on your priority of constituencies, which may differ from mine.4
This is not a recipe book. This won’t tell you what code to change or what commands to run. The right answers for your content will be different than the right answers
for mine. Also: you shouldn’t change what you don’t understand! But I hope these tips will help you think about what questions you need to ask to make your site blazing fast.
Okay, let’s get started…
1. Backstab the plugins you can’t live without
If there are plugins you can’t remove because you depend upon their functionality, and those plugins inject content (especially JavaScript) on the front-end… backstab them to
undermine that functionality.
For example, if you want Jetpack‘s backup and downtime monitoring features, but you don’t want it injecting random <linkrel='stylesheet' id='...-jetpack-css' href='...' media='all' />‘s (an
extra stylesheet to download and parse) into your pages: find the add_filter hook it uses and remove_filter it in your theme5.
Alternatively, entirely remove the wp_head() and manually reimplement the functionality you actually need. Insert your own joke about “Headless WordPress” here.
Better yet, remove wp_head() from your theme entirely6.
Now, instead of blocking the hooks you don’t want polluting your <head>, you’re specifically allowing only those you want. You’ll want to take care to get
some semi-essential ones like <link rel="canonical" href="...">7.
Now most of your plugins are broken, but in exchange, your theme has reclaimed complete control over what gets sent to the user. You can select what content you actually
want delivered, and deliver no more than that. It’s harder work for you, but your site becomes so much lighter.
Your site is faster now. It doesn’t work, but it’s quick about it!
2. Throw away 100% of your render-blocking JavaScript (and as much as you can of the rest)
The single biggest bottleneck to the user viewing a modern WordPress website is the JavaScript that needs to be downloaded, compiled, and executed before the page can be rendered. Most
of that’s plugins, but even on a nearly-vanilla installation you might find a copy of jQuery (eww!) and some other files.
In step 1 you threw it all away, which is great… but I’m betting you were depending on some of that to make your site work? Let’s put it back, carefully and selectively, while
minimising the impact on load time.
That means scripts should be loaded (a) low-down, and/or (b) marked defer (or, better yet, async), so they don’t block page rendering.
If you haven’t already, you might like to View Source on this page. Count my <script> tags. You’ll probably find just two of them: one external file marked
async, and a second block right at the bottom.
The only third-party script routinely loaded on danq.me is Instant.Page, which specifically exists to improve perceived performance. It preloads
links when you hover over or start-to-touch them.
The inline <script> in my footer.php wraps a single line of PHP: which looks a little like
this: <?php echo implode("\n\n", apply_filters( 'danq_footer_js', [] ) ); ?>. For each item in an initially-empty array, it appends to the script tag. When I render
anything that requires JavaScript, e.g. for 360° photography, I can just add to that
(keyed, to prevent duplicates when viewing an archive page) array. Thus, the relevant script gets added exclusively to the pages where it’s needed, not to the entire site.
The only inline script added to every page loads my service worker, which itself aims to optimise caching as well as providing limited “offline” functionality.
While you’re tweaking your JavaScript anyway, you might like to check that any suitable addEventListeners are set to passive mode. Especially if you’re doing anything with touch or
mousewheel events, you can often increase the perceived performance of these interactions by not letting your custom code block the default browser behaviour.
I promise you; most of your blog’s front-end JavaScript is either (a) garbage nobody wants, (b) polyfills for platforms nobody uses, or (c) huge libraries you’ve imported so you can
use just one or two functions form them. Trash them.
3. Don’t use a CDN
Wait, what? That’s the opposite of what everybody else recommends. To understand why, you have to think about why people recommend a CDN in the first place. Their reasons are usually threefold:
Proximity
Claim: A CDN delivers content geographically-closer to the user.
Retort: Often true. But in step 4 we’re going to make sure that everything critical comes within the first TCP
sliding window anyway, so there’s little benefit, and there’s a cost to that extra DNS lookup and fresh handshake. Edge
caching your own contentmay have value, but for most sites it’ll have a much smaller impact than almost everything else on this list.
Precaching
Claim: A CDN improves the chance resources are precached in the user’s browser.
Retort: Possibly true, especially with fonts (although see step 6) but less than you’d think with JS libraries because
there are so many different versions/hosts of each. Yours may well be the only site in the user’s circuit that uses a particular one!
Power Claim: A CDN has more resources than you and so can better-withstand spikes of traffic.
Retort: Maybe, but they also introduce an additional single-point-of-failure. CDNs aren’t magically immune
to downtime nor content-blocking, and if you depend on one you’ve just doubled the number of potential failure points that can make your site instantly useless. Furthermore:
in exchange for those resources you’re trading away your users’ privacy and security: if a CDN gets hacked, every site that
uses it gets hacked too.
Consider edge-caching your own content only if you think you need it, but ditch jsDeliver, cdnjs, Google Hosted Libraries etc.
Despite having no edge cache and being hosted in a different country to me, I can open a completely fresh browser and reach DOMContentLoaded on the my homepage in ~20ms. You should learn how to read a
waterfall performance chart just so you can enjoy how “flat” mine is.
Hell: if you can, ditch all JavaScript served from third-parties and slap a Content-Security-Policy: script-src 'self' header on your domain to dramatically reduce
the entire attack surface of your site!8
4. Reduce your HTML and CSS size to <12kb compressed
There’s a magic number you need to know: 12kb. Because of some complicated but fascinating maths (and depending on how your hosting is configured), it can be significantly faster to
initially load a web resource of up to 12kb than it is to load one of, say, 15kb. Also, for the same reason, loading a web resource of much less than 12kb
might not be significantly faster than loading one only a little less than 12kb.
Inlining as much essential content as possible (CSS, SVGs,
JavaScript etc.) to bring you back up to close-to that magic number again!
$ curl --compressed -so /dev/null -w "%{size_download}\n" https://danq.me/
10416
Note that this is the compressed, over-the-wire size. Last I checked, my homepage weighed-in at about 10.4kb compressed, which includes the entirety of its HTML and CSS, most of its JS, and a
couple of its SVG images.
Again, this probably flies in the face of everything you were taught about performance. I’m sure you were told that you should <link> to your stylesheets so that they
can be cached across page loads. But it turns out that if you can make your HTML and CSS small enough, the opposite is true and you should inline the stylesheet again: caching styles becomes almost irrelevant if you get all the content in
a single round-trip anyway!
For extra credit, consider optimising your homepage’s CSS so it’s even smaller by excluding directives that only apply to
non-homepage pages, and vice-versa. Assuming you’re using a preprocessor, this shouldn’t be too hard: at simplest, you can have a homepage.css and main.css,
each derived from a set of source files some of which they share (reset/normalisation, typography, colours, whatever) and the rest which is specific only to that part of the site.
Most web pages should fit entirely onto a floppy disk. This one doesn’t, mostly because of all the Simpsons clips, but most should.
Can’t manage to get your HTML and CSS down below the magic
number? Then at least ensure that your HTML alone weighs in at <12kb compressed and you’ll still get some of the
benefits. If you’ve got the headroom, you can selectively include a <style> block containing only the most-crucial CSS, with a particular focus on any that results in layout shifts (e.g. anything that specifies the height: of otherwise dynamically-sized
block elements, or that declares an element position: absolute or position: fixed). These kinds of changes are relatively computationally-expensive because
they cause content to re-flow, so provide hints as soon as possible so that the browser can accommodate for them.
5. Make the first load awesome
We don’t really talk about content being “above the fold” like we used to, because the modern Web has such a diverse array of screen sizes and resolutions that doing so doesn’t make
much sense.
But if loading your full page is still going to take multiple HTTP requests (scripts, images, fonts, whatever),
you should still try to deliver the maximum possible value in the first round-trip. That means:
Making sure all your textual content loads immediately! Unless you’re delivering a huge amount of text, there’s absolutely no excuse for lazy-loading text: it’s
usually tiny, compresses well, and it’s fast to parse. It’s also the most-important content of most pages. Get it delivered to the browser so it can be rendered rightaway.
Reserving space for blocks by sizing images appropriately, e.g. using <img width="..." height="..." ...> or having them load as a background with
background-size: cover or contain in a block sized with CSS delivered in the initial payload. This
reduces layout shift, which mitigates the need for computationally-expensive content reflows.
If possible (see point 4), move vector images that support basic site functionality, like logos, inline. This might also apply to icons, if they’re “as important” as text content.
Marking everything up with standard semantic HTML. There’s a trend for component-driven design to go much too
far, resulting in JavaScript components being used in place of standard elements like links, buttons, and images, resulting in highly-fragile websites: when those scripts fail (or are
very slow to load), the page becomes unusable.
If you want to be sure you’re prioritising your content first and foremost, try disabling all CSS, JavaScript, and external
resources (or just access your site in a browser that ignores those things, like Lynx), and check that it’s still usable. As a bonus, this
helps you check for several accessibility issues.
6. Reduce your dependence on downloaded fonts
Fonts are lovely and can be an important part of your brand identity, but they can also add a lot of weight to your web pages.
If you’re ready and able to drop your webfonts and appreciate the beauty and flexibility of a system font stack (I get it: I’m not there quite yet!), you can at least make
smarter use of your fonts:
Every modern browser supports WOFF2, so you can ditch those chunky old formats you’re clinging onto.
If you’re only using the Latin alphabet, minify your fonts further by dropping the characters you don’t need: tools like Google Webfonts
Helper can help with this, as well as making it easier to selfhost fonts from the most-popular library (is a smart idea for the reasons described under point 3, above!). There are
tools available to further minify fonts if e.g. you only need the capital letters for your title font or something.
Browsers are pretty clever and will work-around it if you make a mistake. Didn’t include an emoji or some obscure mathematical symbol, and then accidentally used them in a
post? Browsers will switch to a system font that can fill in the gap, for you.
Make the most-liberal use of the font-display: CSS directive that you can tolerate!
Don’t use font-display: block, which is functionally the default in most browsers, unless you absolutely have to.
font-display: fallback is good if you’re too cowardly/think your font is too important for you to try font-display: optional.
font-display: optional is an excellent choice for body text: if the browser thinks it’s worthwhile to download the font (it might choose not to if the operating
system indicates that it’s using a metered or low-bandwidth connection, for example), it’ll try to download it, but it won’t let doing so slow things down too much and it’ll
fall-back to whatever backup (system) font you specify.
font-display: swap is also worth considering: this will render any text immediately, even if the right font hasn’t downloaded yet, with no blocking time
whatsoever, and then swap it for the right font when it appears. It’s probably better for headings, because large paragraphs of text can be a little disorienting if they change
font while a user is looking at them!
If writing is for nerds, then typography must be doubly-so. But you’ve read this far, so I’m confident that you qualify…
7. Cache pre-compressed static files
It’s possible that by this point you’re saying “if I had to do this much work, I might as well just use a static site generator”. Well good news: that’s what you’re about to
do!
Obviously you should make sure all your regular caching improvements (appropriate HTTP headers for caching, a
service worker that further improves on that logic based on your content’s update schedule, etc.) first. Again: everything in this guide presupposes that you’ve already done the things
that normal people do.
By aggressively caching pre-compressed copies of all your pages, you’re effectively getting the best of both worlds: a website that, for anonymous visitors, is served directly from
.html.gz files on a hard disk or even straight from RAM in memcached10,
but which still maintains all the necessary server-side interactivity to allow it to be used as a conventional Web-based CMS
(including accepting comments if that’s your jam).
WP Super Cache can do the heavy lifting for you for a filesystem-based solution so long as you put it into “Expert” mode and
amending your webserver configuration. I’m using Nginx, so I needed a try_files directive like this:
I’m sure your favourite performance testing tool has already complained at you about your failure to use the best formats possible when serving images to your users. But how can you fix
it?
There are some great plugins for improving your images automatically and/or in bulk – I use EWWW Image Optimizer – but
to really make the most of them you’ll want to reconfigure your webserver to detect clients that Accept: image/webp and attempt to dynamically
serve them .webp variants, for example. Or if you’re ready to give up on legacy formats and replace all your .pngs with .webps, that’s probably
fine too!
The image you see at https://danq.me/_q23u/2023/11/dynamic.png is probably an image/webp. But if your browser doesn’t support WebP, you’ll get an
image/png instead!
Assuming you’ve got curl and Imagemagick‘s identify, you can see this in action:
curl -s https://danq.me/_q23u/2023/11/dynamic.png -H "Accept: image/webp" | identify -
(Will give you a WebP image)
curl -s https://danq.me/_q23u/2023/11/dynamic.png -H "Accept: image/png" | identify -
(Will give you a PNG image, even though the URL is the same)
9. Simplify, simplify, simplify
The single biggest impact you can have upon the performance of your WordPress pages is to make them less complex.
You don’t have to go as light as Gemtext – like this page on Gemini does – to see
benefits.
Writing my templates and posts so that they’re compatible with CapsulePress helps keep my code necessarily-simple. You don’t have to
do that, though, but you should be asking yourself:
Does my DOM need to cascade so deeply? Could I achieve the same with less?
Am I pre-emptively creating content, e.g. adding a hidden <dialog> directly to the markup in the anticipation that it might be triggered later using
JavaScript, rather than having that JavaScript run document.createElement the element after the page becomes readable?
Have I created unnecessarily-long chains of CSS selectors11
when what I really want is a simple class name, or perhaps even a semantic element name?
10. Add a Service Worker
A service worker isn’t magic. In particular, it can’t help you with those new visitors hitting your site for the first time12.
A service worker lets you do smart things on behalf of the user’s network connection, so that by the time they ask for a resource, you already fetched it for them.
But a suitable service worker can do a few things that can help with performance. In particular, you might consider:
Precaching assets that you anticipate they’re likely to need (e.g. if you use different stylesheets for the homepage and other pages, you can preload both so no matter
where a user lands they’ve already got the CSS they’ll need for the entire site).
Preloading popular pages like the homepage and recent articles, allowing them to load quickly.
Caching a fallback pages – and other resources as-they’re-accessed – to support a full experience for users even if they (or your site!) disconnect from the Internet (or even
embedding “save for offline” functionality!).
Chapters 7 and 8 of Going Offline by Jeremy Keith are
especially good for explaining how this can be achieved, and it’s all much easier than everything else I just described.
Anything else?
Did I miss anything? If you’ve got a tip about ramping up WordPress performance that isn’t one of the “typical seven” – probably because it’s too hard to be worthwhile for most people –
I’d love to hear it!
Footnotes
1 You’ll sometimes see guides that suggest that using a CDN is to be recommended specifically because it splits your assets among multiple domains/subdomains, which mitigates browsers’ limitation on the
number of files they can download simultaneously. This is terrible advice, because such limitations essentially don’t exist any more, but DNS lookups and TLS handshakes still have a bandwidth and computational cost. There are good
things about CDNs, sometimes, but this has not been one of them for some time now.
2 I’m not sure why guides keep stressing the importance of minifying code,
because by the time you’re compressing them too it’s almost pointless. I guess it’s helpful if your compression fails?
3 “Use a faster server” is a “just throw money/the environment at it” solution. I’d like
to think we can do better.
4 For my personal blog, I choose to prioritise user experience, privacy, accessibility,
resilience, and standards compliance above almost everything else.
5 If you prefer to keep your backstab code separate, you can put it in a custom plugin,
but you might find that you have to name it something late in the alphabet – I’ve previously used names like zzz-danq-anti-plugin-hacks – to ensure that they load
after the plugins whose functionality you intend to unhook: broadly-speaking, WordPress loads plugins in alphabetical order.
6 I’ve assumed you’re using a classic, not block, theme. If you’re using a block theme,
you get a whole different set of performance challenges to think about. Don’t get me wrong: I love block themes and think they’re a great way to put more people in control of their
site’s design! But if you’re at the point where you’re comfortable digging this deep into your site’s PHP code,
you probably don’t need that feature anyway, right?
7 WordPress is really good at serving functionally-duplicate content, so search
engines appreciate it if you declare a proper canonical URL.
8 Before you choose to block all third-party JavaScript, you might have to
whitelist Google Analytics if you’re the kind of person who doesn’t mind selling their visitor data to the world’s biggest harvester of personal information in exchange for some
pretty graphs. I’m not that kind of person.
10 I’ve experimented with mounting a ramdisk and storing the WP Super Cache directory
there, but it didn’t make a huge difference, probably because my files are so small that the parse/render time on the browser side dominates the total cascade, and they’re already
being served from an SSD. I imagine in my case memcached would provide similarly-small benefits.
11 I really love the power of CSS preprocessors like Sass, but they do make it deceptively easy to create many more – and longer – selectors
than you intended in your final compiled stylesheet.
12 Tools like Lighthouse usually simulate first-time visitors, which can be a little
unfair to sites with great performance for established visitors. But everybody is a first-time visitor at least once (and probably more times, as caches expire or are
cleared), so they’re still a metric you should consider.
<link rel="alternate" type="application/rss+xml"> is fine, but I feel like there should be a standard for a site, not a page, to share a “list of
feeds associated with a site”.
Last night, I dreamed about a way to achieve that: ./well-known/feeds as an OPML document. Here’s mine, and here’s a draft spec.
There’s a perception that a blog is a long-lived, ongoing thing. That it lives with and alongside its author.1
But that doesn’t have to be true, and I think a lot of people could benefit from “short-term” blogging. Consider:
Photoblogging your holiday, rather than posting snaps to social media
You gain the ability to add context, crosslinking, and have permanent addresses (rather than losing eveything to the depths of a feed). You can crosspost/syndicate to your favourite
socials if that’s your poison..
Photoblog your holiday and I might follow it, and I’ll do so at my convenience. Put your snaps on Facebook and I almost certainly won’t bother. Photo courtesy ArtHouse Studio.
Blogging your studies, rather than keeping your notes to yourself
Writing what you learn helps you remember it; writing what you learn in a public space helps others learn too and makes it easy to search for your discoveries later.2
Recording your roleplaying, rather than just summarising each session to your fellow players
My D&D group does this at levellers.blog! That site won’t continue to be updated forever – the party will someday retire or, more-likely, come to a glorious but horrific end – but
it’ll always live on as a reminder of what we achieved.
One of my favourite examples of such a blog was 52 Reflect3 (now integrated into its successor The Improbable Blog). For 52 consecutive weeks my partner‘s brother Robin
blogged about adventures that took him out of his home in London and it was amazing. The project’s finished, but a blog was absolutely the right medium for it because now it’s got a
“forever home” on the Web (imagine if he’d posted instead to Twitter, only for that platform to turn into a flaming turd).
I don’t often shill for my employer, but I genuinely believe that the free tier on WordPress.com is an excellent
way to give a forever home to your short-term blog4.
Did you know that you can type new.blog (or blog.new; both work!) into your browser to start one?
What are you going to write about?
Footnotes
1This blog is, of course, an example of a long-term blog. It’s been going in
some form or another for over half my life, and I don’t see that changing. But it’s not the only kind of blog.
2 Personally, I really love the serendipity of asking a web search engine for the solution
to a problem and finding a result that turns out to be something that I myself wrote, long ago!
4 One of my favourite features of WordPress.com is the fact that it’s built atop the
world’s most-popular blogging software and you can export all your data at any time, so there’s absolutely no lock-in: if you want to migrate to a competitor or even host your own
blog, it’s really easy to do so!
This is an alternate history of the Web. The premise is true, but the story diverges from our timeline and looks at an alternative “Web that might have been”.
Prehistory
This is the story of P3P, one of the greatest Web standards whose history has been forgotten1, and how the abject failure of its first versions paved the
way for its bright future decades later. But I’m getting ahead of myself…
Drafted in 2002 in the wake of growing concern about the death of privacy on the Internet, P3P 1.0 aimed to make the collection of personally-identifiable data online transparent. Hurrah, right?
Initially, the principle was sound. The specification was weak. The implementation was apalling. But P3P 1.1
could have worked well.
Developers are lazy3 and soon converged on the simplest possible solution: add a garbage HTTP header like P3P: CP="See our website for our privacy policy." and your cookies work just fine! Ignore the problem, ignore the
proposed solution, just do what gets the project shipped.
Without any meaningful enforcement it also perfectly feasible to, y’know, just lie about how well you treat user data. Seeing the way the wind was blowing, Mozilla dropped
support for P3P, and Microsoft’s support – which had always been half-baked and lacked even the most basic user-facing
controls or customisation options – languished in obscurity.
For a while, it seemed like P3P was dying. Maybe, in some alternate timeline, it did die: vanishing into
nothing like VRML, WAP, and XBAP.
But fortunately for us, we don’t live in that timeline.
Revival
In 2009, the European Union revisited the Privacy and Electronic Communications
Directive. The initial regulations, published in 2002, required that Web users be able to opt-out of tracking cookies, but the amendment required that sites ensure that
users opted-in.
As-written, this confusing new regulation posed an
immediate problem: if a user clicked the button to say “no, I don’t want cookies”, and you didn’t want to ask for their consent again on every page load… you had to give them a cookie
(or use some other technique
legally-indistinguishable from cookies). Now you’re stuck in an endless cookie-circle.4
This, and other factors of informed consent, quickly introduced a new pattern among those websites that were fastest to react to the legislative change:
The cookie consent banner, with all its confusing language and dark patterns, looked like it was going to become the new normal for web users in the early 2010s. But thankfully, our
saviour had been waiting in the wings all along.
Web users rebelled. These ugly overlays felt like a regresssion to a time when popup ads and splash pages were commonplace. “If only,” people cried out, “There were a better way to do
this!”
It was Professor Lorie Cranor, one of the original authors of the underloved P3P specification and a respected champion of usable privacy and security, whose rallying cry gave us hope. Her CNET article, “Why
the EU Cookie Directive is a solved problem”5, inspired a new generation of development on what would become known as P3P 2.0.
While maintaining backwards compatibility, this new standard:
deprecated those horrible XML documents in favour of HTTP
headers and <link> tags alone,
removing support for Set-Cookie2: headers, which nobody used anyway, and
added features by which the provenance and purpose of cookies could be stated in a way that dramatically simplified adoption in browsers
Internet Explorer at this point was still used by a majority of Web users. It still supported the older
version of the standard, and – as perhaps the greatest gift that the much-maligned browser ever gave us – provided a reference implementation as well as a stepping-stone to wider
adoption.
Opera, then Firefox, then “new kid” Chrome each adopted P3P 2.0; Microsoft finally got on board with IE 8 SP 1. Now the latest versions of all the mainstream browsers had a solid
implementation6
well before the European data protection regulators began fining companies that misused tracking cookies.
Nowadays, we’ve pretty-well standardised on the address bar being the place where all cookie and privacy information and settings are stored. Can you imagine if things had
gone any other way?
But where the story of P3P‘s successes shine brightest came in 2016, with the passing of the GDPR. The W3C realised that P3P could simplify both the expression and understanding of privacy policies for users, and formed a group to work on version 2.1. And that’s
the version you use today.
When you launch a new service, you probably use one of the many free wizard-driven tools to express your privacy policy and the bases for your data processing, and it spits out a
template privacy policy. You need the human-readable version, of course, since the 2020 German court ruling that you cannot rely on a machine-readable privacy policy alone, but
the real gem is the P3P: 2.1 header version.
Assuming you don’t have any unusual quirks in your data processing (ask your lawyer!), you can just paste the relevant code into your server configuration and you’re good to go. Site
users get a warning if their personal data preferences conflict with your data policies, and can choose how to act: not using your service, choosing which of your
features to opt-in or out- of, or – hopefully! – granting an exception to your site (possibly with caveats, such as sandboxing your cookies or clearing them immediately after closing
the browser tab).
Sure, what we’ve got isn’t perfect. Sometimes companies outright lie about their use of information or use illicit methods to track user behaviour. There’ll always be bad guys out there. That’s what laws are there to deal with.
But what we’ve got today is so seamless, it’s hard to imagine a world in which we somehow all… collectively decided that the correct solution to the privacy problem might have been to
throw endless popovers into users’ faces, bury consent-based choices under dark patterns, and make humans do the work that should from the outset have been done by machines. What a
strange and terrible timeline that would have been.
Footnotes
1 If you know P3P‘s
history, regardless of what timeline you’re in: congratulations! You win One Internet Point.
2 Techbros have been trying to solve political problems using technology since long before
the word “techbro” was used in its current context. See also: (a) there aren’t enough mental health professionals, let’s make an AI app? (b) we don’t have enough ventilators for this
pandemic, let’s 3D print air pumps? (c) banks keep failing, let’s make a cryptocurrency? (d) we need less carbon in the atmosphere or we’re going to go extinct, better hope direct
carbon capture tech pans out eh? (e) we have any problem at all, lets somehow shoehorn blockchain into some far-fetched idea about how to solve it without me having to get out of my
chair why not?
3 Note to self: find a citation for this when you can be bothered.
4 I can’t decide whether “endless cookie circle” is the name of the New Wave band I want
to form, or a description of the way I want to eventually die. Perhaps both.
6 Implementation details varied, but that’s part of the joy of the Web. Firefox favoured
“conservative” defaults; Chrome and IE had “permissive” ones; and Opera provided an ultra-configrable matrix of options by which a user could specify exactly which kinds of cookies to
accept, linked to which kinds of personal data, from which sites, all somehow backed by an extended regular expression parser that was only truly understood by three people, two of
whom were Opera developers.
It all started when I saw no-ht.ml, Terence Eden‘s hilarious response to Salma
Alam-Naylor‘s excellent HTML is all you need to make a website. The latter is an
argument against both the silly amount of JavaScript with which websites routinely burden their users, but also even against depending on CSS. As a fan of CSS Naked Day and a firm
believer in using JS only for progressive enhancement, I’m obviously in favour.
Obviously no-ht.ml is to be taken as tongue-in-cheek, but as you’re about to see: it caught my interest and got me thinking: how could I go even further.
Terence’s site works by delivering a document with a
claimed MIME type of text/html, but which contains only the (invalid) “HTML” code
<!doctype UNICODE><meta charset="UTF-8"><plaintext> (to work around browsers’ wish to treat the page as HTML). This is followed by a block of UTF-8 plain text making use of spacing
and emoji to illustrate and decorate the content. It’s frankly very silly, and I love it.1
I think it’s possible to go one step further, though, and create a web page with no code whatsoever. That is, one that you can read as if it were a regular web page, but where
using View Source or e.g. downloading the page with curl will show you… nothing.
I present: The Page With No Code! (It’ll probably only work if you’re using Firefox, for reasons that will become apparent later.)
I’d encourage you to visit The Page With No Code, use View Source to confirm for yourself that it truly has no code, and see if you can work out for yourself how it manages
this feat… before coming back here for an explanation. Again: probably Firefox-only.
Once you’ve had a look for yourself and had a chance to form an opinion, here’s an explanation of the black magic that makes this atrocity possible:
The page is blank. It’s delivered with Content-Type: text/html. Your browser interprets a completely-blank page as faulty and corrects it to a functionally-blank
minimal HTML page: <html><head></head><body></body></html>.
<body> and <html> elements can be styled with CSS; this includes the ability to add
content:::before and ::after each
element. If only we could load a stylesheet then content injection is possible.
We use the fourth way to inject
CSS – a Link: HTTP header – to deliver a CSS payload (this, unfortunately, only works in Firefox). To further obfuscate what’s happening and remove the need for a round-trip, this is encoded
as a data: URI.
The stylesheet – and all the page content – is right there in the Link: header if you just care to decode it! Observe that while 5.84kB of
data are transferred, the browser rightly states that the page is zero bytes in size.
My server-side implementation of this broke in 2023 after I upgraded Nginx; my new version doesn’t support the super-long Link: header needed
to make this hack work, so I’ve updated the page to use the Link: to reference the CSS file rather than embed it via a data URI. It’s not as cool, but it at least means you can
still see the page. Thanks to Thomas Bradshaw for pointing out the problem.
Footnotes
1 My first reaction was “why not just deliver something with Content-Type:
text/plain; charset=utf-8 and dispense with the invalid code, but perhaps that’s just me overthinking the non-existent problem.
This is a reply to a post published elsewhere. Its content might be duplicated as a traditional comment at the original source.
In his blog post “The ethics of syndicating comments using WebMentions”, Terence Eden said:
…
I want to see what people are writing in public about my posts. I also want to direct people to the conversations which are happening elsewhere on the web. But people – quite
rightly – might not want their content permanently stored by my site.
So I think I have a few options.
Do nothing. My site; my rules. If you don’t want me to grab your hot takes, don’t post them in public. (Feels a bit rude, TBQH.)
Be reactive. If someone asks me to remove their content, do so. (But, of course, how will they know I’ve made a copy?)
Stop syndicating comments. (I don’t wanna!)
Replace the verbatim comments with a link saying “Fred mentioned this article on Twitter” . (A bit of a disruptive experience for readers.)
Use oEmbed to capture the user’s comment and dynamically load it from the 3rd party site. That would update automatically if the user changes their name or deleted the
comment. (A massive faff to set up.)
…
Terence describes a problem that I’ve wrestled with myself. If somebody comments directly on my blog using the form at the bottom of a post, that’s a pretty strong indicator of
them giving their consent for their comment to be published at the bottom of that post (at my discretion). If somebody publicly replies somewhere my post is syndicated, that’s
less-obvious, but still pretty clear. If somebody merely mentions my post publicly, writing their own post and linking to mine… that’s a real fuzzy area.
I take a minimal approach; only capturing their full content if it’s short and otherwise trying to extract a snippet that contains the bit that mentioned my content, and I think that
works great. But Terence points out an important follow-up: what if the commenter deletes that content?
My approach so far has always been a reactive one – the second in Terence’s list – and I think it’s a morally-acceptable stance for a personal blogger. But I’m not sure it scales. I
find myself asking: what if a news outlet did this, taking my self-published feedback to their story and publishing it on their site, even if I later amended, retracted, or deleted it
on my own? If somebody’s making money out of my content, that feels different: I’ve always been clear that what I write on my blog is permissively-licensed, but that permissiveness is based on the prohibition of commercial use of
my content.
Perhaps down the line this can be solved technologically: something machine-readable akin to the <link rel="license" ...> tag could state an author’s preference for
how their content is syndicated by third parties they’ve mentioned, answering questions like:
Can you quote me, or just link to me? Who do these rules apply to? (Should we be attaching metadata to individual links?)
Should you inform me that you’ve done so, and if so: how (WebMention, etc.)?
If you (or your site) observe that my content has disappeared or changed for an extended time, should that be taken as revokation of consent to syndicate it?
Right now, the relevant technologies are not well-established enough to even begin this kind of work, but if a modern interconected federated web of personal websites takes off, it’s
the kind of question we might one day have to answer.
For now my gut feeling is that option #2 (reactive moderation of syndicated comments) is ethically-sufficient for personal websites. But I’ll be watching the feedback Terence (who
probably gets many more readers than I) receives in case my gut doesn’t represent the majority!
103: Early Hints (“I’m not sure this can last forever.”)
300: Multiple Choices (“There are so many ways I can do better than you.”)
303: See Other (“You should date other people.”)
304: Not Modified (“With you, I feel like I’m stagnating.”)
402: Payment Required (“I am a prostitute.”)
403: Forbidden (“You don’t get this any more.”)
406: Not Acceptable (“I could never introduce you to my parents.”)
408: Request Timeout (“You keep saying you’ll propose but you never do.”)
409: Conflict (“We hate each other.”)
410: Gone (ghosted)
411: Length Required (“Your penis is too small.”)
413: Payload Too Large (“Your penis is too big.”)
416: Range Not Satisfied (“Our sex life is boring and repretitive.”)
425: Too Early (“Your premature ejaculation is a problem.”)
428: Precondition Failed (“You’re still sleeping with your ex-!?”)
429: Too Many Requests (“You’re so demanding!”)
451: Unavailable for Legal Reasons (“I’m married to somebody else.”)
502: Bad Gateway (“Your pussy is awful.”)
508: Loop Detected (“We just keep fighting.”)
With thanks to Ruth for the conversation that inspired these pictures, and apologies to the rest of the Internet for creating them.
I’m off work sick today: it’s just a cold, but it’s had a damn good go at wrecking my lungs and I feel pretty lousy. You know how when you’ve got too much of a brain-fog to trust
yourself with production systems but you still want to write code (or is that just me?), so this morning I threw together a really,
really stupid project which you can play online here.
It’s a board game. Well, the digital edition of one. Also, it’s not very good.
It’s inspired by a toot by Mason”Tailsteak” Williams (whom I’ve mentioned before once or
twice). At first I thought I’d try to calculate the odds of winning at his proposed game, or how many times one might expect to play before winning,
but I haven’t the brainpower for that in my snot-addled brain. So instead I threw together a terrible, terrible digital implementation.
Go play it if, like me, you’ve got nothing smarter that your brain can be doing today.
Just in time for Robin Sloan to give up on Spring ’83, earlier this month I finally got aroud to launching STS-6 (named for the first mission of the Space Shuttle Challenger in Spring 1983), my
experimental Spring ’83 server. It’s been a busy year; I had other things to do. But you might have guessed that something like this had been under my belt when I open-sourced a keygenerator for the protocol the other day.
If you’ve not played with Spring ’83, this post isn’t going to make much sense to you. Sorry.
My server is, as far as I can tell, very different from any others in a few key ways:
It does not allow third-party publishing at all. Some might argue that this undermines the aim of the exercise, but I disagree. My IndieWeb inclinations lead me to
favour “self-hosted” content, shared from its owners’ domain. Also: the specification clearly states that a server must implement a denylist… I guess my
denylist simply includes all keys that are not specifically permitted.
It’s geared towards dynamic content.My primary board self-publishes whenever I produce a new blog post, listing the most recent
blog posts published. I have another half-implemented which shows a summary of the most-recent post, and another which would would simply use a WordPress page as its basis – yes, this
was content management, but published over Spring ’83.
It provides helpers to streamline content production. It supports internal references to other boards you control using the format {{board:123}}which are
automatically converted to addresses referencing the public key of the “current” keypair for that board. This separates the concept of a board and its content template from that
board’s keypairs, making it easier to link to a board. To put it another way, STS-6 links are self-healing on the server-side (for local boards).
It helps automate content-fitting. Spring ’83 strictly requires a maximum board size of 2,217 bytes. STS-6 can be configured to fit a flexible amount of dynamic
content within a template area while respecting that limit. For my posts list board, the number of posts shown is moderated by the size of the resulting board: STS-6 adds more and
more links to the board until it’s too big, and then removes one!
It provides “hands-off” key management features. You can pregenerate a list of keys with different validity periods and the server will automatically cycle through
them as necessary, implementing and retroactively-modifying <link rel="next"> connections to keep them current.
I’m sure that there are those who would see this as automating something that was beautiful because it was handcrafted; I don’t know whether or not I agree, but had Spring ’83
taken off in a bigger way, it would always only have been a matter of time before somebody tried my approach.
From a design perspective, I enjoyed optimising an SVG image of my header so it could meaningfully fit into the board. It’s
pretty, and it’s tolerably lightweight.
If you want to see my server in action, patch this into your favourite Spring ’83 client:
https://s83.danq.dev/10c3ff2e8336307b0ac7673b34737b242b80e8aa63ce4ccba182469ea83e0623
A dead end?
Without Robin’s active participation, I feel that Spring ’83 is probably coming to a dead end. It’s been a lot of fun to play with and I’d love to see what ideas the experience of it
goes on to inspire next, but in its current form it’s one of those things that’s an interesting toy, but not something that’ll make serious waves.
In his last lab essay Robin already identified many of the key issues with the system (too complicated, no interpersonal-mentions, the challenge of keys-as-identifiers, etc.) and while
they’re all solvable without breaking the underlying mechanisms (mentions might be handled by Webmention, perhaps, etc.), I
understand the urge to take what was learned from this experiment and use it to help inform the decisions of the next one. Just as John Postel’s Quote of the Day protocol doesn’t see much use any more (although maybe if my
finger server could support QotD?) but went on to inspire the direction of many subsequent “call-and-response” protocols,
including HTTP, it’s okay if Spring ’83 disappears into obscurity, so long as we can learn what it did
well and build upon that.
Meanwhile: if you’re looking for a hot new “like the web but lighter” protocol, you should probably check out Gemini. (Incidentally, you
can find me at gemini://danq.me, but that’s something I’ll write about another day…)
This weekend I was experimentally reimplenting how my blog displays comments. For testing I needed to find an old post with both trackbacks and pingbacks on it. I found my post that you linked, here, and was delighted to be reminded that despite both of our blogs changing domain name (from photomatt.net to ma.tt
and from blog.scatmania.org to danq.me, respectively), all the links back and forth still work perfectly because clearly we share an apporopriate dedication to the principle that
Cool URIs Don’t Change, and set up our redirects accordingly. 🙌
Incidentally, this was about the point in time at which I first thought to myself “hey, I like what Matt’s doing with this Automattic thing; I should work there someday”. It took me
like a decade to a decade-and-a-half to get around to applying, though… 😅
Anyway: thanks for keeping your URIs cool so I could enjoy this trip down memory lane (and debug an experimental wp_list_comments callback!).
That’s a really useful thing to have in this new age of the web, where Refererer: headers are no-longer commonly passed cross-domain and Google Search no longer provides the link: operator. If you want to know if I’ve ever
linked to your site, it’s a bit of a drag to find out.
To nobody’s surprise whatsoever, I’ve made a so many links to Wikipedia that I might be single-handedly responsible for their PageRank.
So, obviously, I’ve written an implementation for WordPress. It’s really basic right now, but the source code can be
found here if you want it. Install it as a plugin and run wp outbound-links to kick it off. It’s fast: it takes 3-5 seconds to parse the entirety of danq.me,
and I’ve got somewhere in the region of 5,000 posts to parse.
You can see the results at https://danq.me/.well-known/links – if you’ve ever wondered “has Dan ever linked to my site?”, now you can find the
answer.
If this could be useful to you, let’s collaborate on making this into an actually-useful plugin! Otherwise it’ll just languish “as-is”, which is good enough for my purposes.
[Nilay:]It is fashionable to run around saying the web is dead and that apps shape the world, but in my mind, the web’s pretty healthy for at least two things: news
and shopping.
[Matt:] I think that’s your bubble, if I’m totally honest. That’s what’s cool about the web: We can live in a bubble and that can seem like the whole thing. One thing I would
explicitly try to do in 2022 is make the web weirder.
…
The Verge interviewed Matt Mullenweg, and – as both an Automattician and a fan of the Web as
a place for fun and weirdness – I really appreciated the direction the interview went in. I maintain that open web standards and platforms (as opposed to closed social media silos)
are inspirational and innovative.
Emilie Reed‘s Anything a Maze lives on itch.io, and (outside of selfhosting) that’s
clearly the best place for it: you couldn’t tell that story the same way on Medium; even less-so on Facebook or Twitter.
Suppose you’re running an application on a Passenger + Nginx powered server and you want to add caching.
Perhaps your application has a dynamic, public endpoint but the contents don’t change super-frequently or it isn’t critically-important that the user always gets up-to-the-second
accuracy, and you’d like to improve performance with microcaching. How would you do that?
Where you’re at
Not pictured: the rest of the Internet.
Your configuration might look something like this:
1
2
3
4
5
6
7
server {
# listen, server_name, ssl, logging etc. directives go here# ...root/your/application;
passenger_enabledon;
}
What you’re looking for is proxy_cache and its sister directives, but you can’t just
insert them here because while Passenger does act act like an upstream proxy (with parallels like e.g. passenger_pass_header which mirrors the behaviour of proxy_pass_header), it doesn’t provide any of the functions you need to implement proxy caching
of non-static files.
Where you need to be
Instead, what you need to to is define a second server, mount Passenger in that, and then proxy to that second server. E.g.:
# Set up a cacheproxy_cache_path/tmp/cache/my-app-cachekeys_zone=MyAppCache:10mlevels=1:2inactive=600smax_size=100m;
# Define the actual webserver that listens for Internet traffic:server {
# listen, server_name, ssl, logging etc. directives go here# ...# You can configure different rules by location etc., but here's a simple microcache:location/ {
proxy_passhttp://127.0.0.1:4863; # Proxy all traffic to the application server defined belowproxy_cacheMyAppCache; # Use the cache defined aboveproxy_cache_valid2003s; # Treat HTTP 200 responses as valid; cache them for 3 secondsproxy_cache_use_staleupdating; # (Optional) send outdated response while background-updating cacheproxy_cache_lock on; # (Optional) only allow one process to update cache at once
}
}
# (Local-only) application server on an arbitrary port number to act as the upstream proxy:server {
listen 127.0.0.1:4863;
root/your/application;
passenger_enabledon;
}
The two key changes are:
Passenger moves to a second server block, localhost-only, on an arbitrary port number (doesn’t need HTTPS, of course, but if your application detects/”expects” HTTPS you
might need to tweak your headers).
Your main server block proxies to the second as its upstream, and you can add whatever caching directives you like.
Obviously you’ll need to be smarter if you host a mixture of public and private content (e.g. send Vary: headers from your application) and if you want different cache
durations on different addresses or types of content, but there are already great guides to help with that. I only wrote this post because I spent some time searching for (nonexistent!)
passenger_cache_ etc. rules and wanted to save the next person from the same trouble!