The Continuum

Last week, I discovered Geneveive Raine‘s “The Continuum”, a super-compressed image comprised of 1-pixel-tall versions of her home page’s daily banners, stitched together1.

I thought it was a beautiful idea, so I stole adapted it to produce an illustration based on the featured images of my blog posts:

Extremely tall diagram consisting of 2,062 horizontal lines in a variety of different colours, each representing a different blog post.
Only about 38% of my 5,445 blog posts have featured images suitable for use in this diagram. But here they are!

I generated a horizontal version too, but I’ve used the vertical version above because it’s more-suitable for use with a HTML imagemap2.

Here’s the code I used to generate the images (and the imagemap), if you want to run it against your own WordPress-ish blog.

Footnotes

1 Which was in-turn inspired by Movie Iris, a tool that visualises the frames of a movie as a radial graphic.

2 What’s a HTML imagemap, you ask? You don’t need to ask: you shouldn’t be using it anyway. Relying on it means you’re setting yourself up for an accessibility nightmare. Anyway: I used one above: you can click on any “stripe” of the image to jump to the corresponding post. It needed some fighting-with because imagemaps can’t work with rescaled images, so I’ve forced the height of the image even as it resizes horizontally. Not that you’re going to click on the stripes anyway: it’s just about the worst way imaginable to navigate a blog.

Blog Questions Challenge

Since Kev Quirk made an adaptation of Ava‘s Blog Questions Challenge I’ve been seeing it everywhere in my blogosphere circle. I’ve gotta be the last person left on Earth to do it, but it has that old-school pass-it-along meme feel, like that 2006 one about describing your friends. I’ve not been tagged by name, but both Jeremy and Garrett did a broad “you” tag, so I’m taking it.

Why did you start blogging in the first place?

It felt like a natural evolution of my second vanity-site. It was 1998, and my site – Castle of the Four Winds – was home to a selection of the same kinds of random crap that everybody put on their homepages at the time. I figured I’d start keeping an online diary: the word “blog” hadn’t been coined yet, and its predecessor “weblog” had only been around for a year and I hadn’t come across it.

So I experimentally started posting a few times a week.

Castle of the Four Winds in early 1999: a very-90s website of white and red text on black, with Times New Roman text, a flaming hit counter, and a blue ribbon campaign button.
I don’t have many of my posts from 1998, but I know from other records that my first deliberately “blog”-like post was on 27 September. But the posts shown in this screenshot, from January 1999, survive and can still be read here1.
By the way, if you liked how my site looked back in the 1990s, you can wind the clock back! Give it a go!

What platform are you using to manage your blog and why did you choose it? Have you blogged on other platforms before?

1998: Static HTML and a bit of Perl

When I started blogging my site was almost entirely plain HTML2. So my original “platform” was probably Emacs.

2000: Static files indexed by PHP

In the Summer of 2000 I registered avangel.com and moved my diary there. I was still storing posts in static files, but used PHP wrappers to share the structure and menus across the pages. It was a massive improvement.

Later, I moved everything to the (ill-advised?) domain name scatmania.org and reimplemented in pretty-much the same way. Until…

2003: Flip

The first real “blogging engine” I used was Flip.

The first version of Scatmania.org: a Flip-powered weblog.
Flip was a bit of a pain to theme, which is why my Flip-powered blog looked quite a bit like most other Flip-powered blogs.

I liked Flip3: it had a raw simplicity that I’d later come to love in young versions of WordPress. And being able to edit from the Web was a huge improvement over having to edit files, especially when I was out and about: I managed to post from my dad’s BlackBerry while cycling across the Outer Hebrides, for example.

2004: WordPress

I’d have outgrown Flip eventually, but I got a nudge in that direction in July 2004. At the time, I was sharing a server with some friends and operated by Gareth, and something went wrong and the server went completely offline. The co-located server disappeared back to Gareth’s house, eventually, and while I’d recovered many of the posts from my own backups, 61 posts remain partially-incomplete to this day (if you happen to have a copy of any of them I’d love to see it!).

I brought my blog back online using WordPress, whose then-new release version 1.2 included an RSS-powered importer: this allowed me to write a little code to convert my entire previous archive into a fat RSS file and then import it wholesale. WordPress was, as remains, pretty magical – a universal blogging platform that evolved into a universal CMS – and I back in the day I occasionally argued online with Matt about technical aspects of the future direction of the project4.

Scatmana.org version 2 - now with actual web design
Those drop-shadows! Those gradients! Those naked hyperlinks differentiated only by being a slightly different colour! That aggressive use of sans-serif fonts with expanded line-heights! Those RSS links, front-and-centre! The only thing that could make this more-obviously “Web 2.0” would be the addition of a wonky “beta” star in the corner.

Incidentally, if you’d like to see more of my blog’s design history over the last 26+ years, I shared a lot of screenshots back in 2018.

If you didn’t know better, you might well not know I’m running WordPress. My theme and custom plugins are… well, they’re an ecosystem all by themselves. And that’s before you even get to things like CapsulePress, my WordPress-to-Gopher/Gemini/Spartan/Nex bridge, the pile of scripts I use to sync-up with the Fediverse, the PWA I use to post notes while I’m on the move, and so on.

2025: ClassicPress

Earlier this year I experimentally switched to ClassicPress; a fork of WordPress. There’ll doubtless be lots more to say about that, down the line5, but here’s the skinny: I don’t use Gutenberg on my blog anyway6, I appreciate having my backend be almost as high-performance as I’ve worked to make my frontend, and I enjoy most of the feature differences7.

How do you write your posts? For example, in a local editing tool, or in a panel/dashboard that’s part of your blog?

With the exception of notes (most of which are written in a tool of my own creation and then pushed to one or both of my Mastodon and my blog simultaneously), I mostly write right into the WordPress/ClassicPress post editor.

I often write ideas, concepts, and first drafts into my Obsidian notebook and then copy/paste out when the time comes.

When do you feel most inspired to write?

There’s no particular pattern, though it feels like I’m most-inspired to write exactly when I should be prioritising something else! That’s why it’s so helpful to be able to write three sentences into Obsidian and then come back to it later!

I’ve been on a bit of a blogging kick these last few years, though. Last year I wrote a massive 436 posts, although that admittedly includes PESOS‘d checkins from geocaching and geohashing expeditions. I’m a fan of Kev’s #100DaysToOffload challenge, and I’m on course to achieve it earlier than ever before, this year (my sixth consecutive year: I do the challenge strictly by calendar years!), as this post is already by 48th… all within the first 38 days of this year8.

Do you publish immediately after writing, or do you let it simmer a bit as a draft?

A mixture of both. Probably most of my posts are written in a single sitting… or, at least, are written in a tab that stays open for the entire time during which it’s written.

But others spend a long time in-progress. You remember how almost a year ago I gave a talk about why Oxford’s area code is 01865? And I promised that there’d be a blog/vlog/maybe-podcast version of that talk later? Yeah: that’s been 90%-there and sitting in a draft pretty-much since then, just waiting for me to make the finishing touches (and record the vlog/podcast variants, if that’s the direction I decide to go in).

And I’ve dusted off drafts that’ve been much older than that, before, too. So it really is a mixture.

What’s your favourite post on your blog?

I couldn’t pick out a favourite that I wouldn’t change my mind about five minutes later. But a recent favourite might have been last Spring’s “Let Your Players Lead The Way”, which aimed to impart some of the things I’ve learned about gamemastering (especially) while being the dungeon master for The Levellers these last few years9.

Not only was it a post that had been a long time coming, and based on months of drafts and re-drafts, but also I really enjoyed writing some post-specific CSS to give it just a slightly more-magical feel.

Screenshot from Let Your Players Lead The Way, showcasing its design in the style of the D&D Players' Handbook.
The downside is that I’ve now got one more thing to try not to break the next time I re-write my blog’s stylesheet.

Any future plans for your blog? Maybe a redesign, a move to another platform, or adding a new feature?

I want to redesign the homepage to be simpler, less-graphical, and more-informational. I’m not sure how that’s going to look, yet.

I’ve been wondering about integrating some of my personal-geotracking into the design (Aaron Parecki does an amazing job of this with his dynamic site background image, for example).

I’m playing with the idea of adding a guestbook, like it’s 1998 again or something.

I’d like to tidy up my tagging taxonomy, and I’m not convinced AI is up to the task.

I need to decide how I feel about the emoji reactions feature I added in 2023. I’m still undecided. What do you think? 👍? 👎?

And as I mentioned: I’m experimenting with ClassicPress. It’s working out mostly-okay so far, but that’s a story for another post.

Next?

I feel like I’m the last person in the universe to do this quiz. But if you haven’t – and you have anything approximating a blog – then you should go next.

Footnotes

1 I wouldn’t recommend actually reading my older posts, though. I was a teenager, and it shows.

2 I had a slightly-fancier kind of hosting, by this point, that gave me a cgi-bin directory into which I could compile binaries (in C) or write scripts (in Perl). My hit counter? That was a Perl script I adapted from Matt Wright’s counter.pl and “enhanced” with some flaming text using Corel Photo-Paint.

3 While writing this post, I hunted down the original developer of Flip. He seems cool.

4 A year later he launched WordPress.com, which then evolved into the foundation of Automattic, and there soon came a point where I thought “I should work there, someday!” It took me a further 14 years before I applied for such a job, though.

5 Right off the bat, though, let me stress that trying ClassicPress is absolutely nothing to do with the drama in the WordPress space right now: in fact I’ve been planning to give it a try ever since the project got its shit together, re-forked WordPress, and released ClassicPress 2.0 a year ago.

6 I don’t have anything against Gutenberg – I use it on other blogs, and every day at work! – and Block Themes are magical… but I’ve never found any benefit to them here: I’ve no need for it, and I’ve got plugins I’ve written for my own use that I’ve never bothered to make Gutenberg-compatible.

7 My biggest gripe with ClassicPress so far is that in removing the jQuery dependency on the post editor’s tag selector they’ve only replaced it with a <datalist>, which is neat and all but kills the ability to autocomplete multiple comma-separated tags at once. But it looks like that’s getting fixed, so I’m going to hang in there for a bit before I decide whether I’m sticking with ClassicPress or not.

8 I’ll save you from doing the maths: if I complete 48 posts in 38 days, I’d expect to complete 100 posts on my 80th day: as it’s not a leap year, that would be Friday 21 March 2025. Let’s see how I get on!

9 Although I’ve been horribly neglecting them for the last couple of months, for various reasons.

× × ×

Can AI retroactively fix WordPress tags?

I’ve a notion that during 2025 I might put some effort into tidying up the tagging taxonomy on my blog. There’s a few tags that are duplicates (e.g. ai and artificial intelligence) or that exhibit significant overlap (e.g. dog and dogs), or that were clearly created when I speculated I’d write more on the topic than I eventually did (e.g. homa night, escalators1, or nintendo) or that are just confusing and weird (e.g. not that bacon sandwich picture).

Cloud-shaped wordcloud of tags used on DanQ.me, sized by frequency. The words "geocaching" and "cache log" dominate the centre of the picture, with terms like "fun", "funny", "geeky", "news", "video games", "technology", "video", and "review" a close second.
Not an entirely surprising word cloud of my tag frequency, given that all of my cache logs are tagged “cache log” and “geocaching”, and relatively-generic terms like “technology”, “fun”, and “funny” appear all over the place on my blog.

Retro-tagging with AI

One part of such an effort might be to go back and retroactively add tags where they ought to be. For about the first decade of my blog, i.e. prior to around 2008, I rarely used tags to categorise posts. And as more tags have been added it’s apparent that many old posts even after that point might be lacking tags that perhaps they ought to have2.

I remain sceptical about many uses of (what we’re today calling) “AI”, but one thing at which LLMs seem to do moderately well is summarisation3. And isn’t tagging and categorisation only a stone’s throw away from summarisation? So maybe, I figured, AI could help me to tidy up my tagging. Here’s what I was thinking:

  1. Tell an LLM what tags I use, along with an explanation of some of the quirkier ones.
  2. Train the LLM with examples of recent posts and lists of the tags that were (correctly, one assumes) applied.
  3. Give it the content of blog posts and ask what tags should be applied to it from that list.
  4. Script the extraction of the content from old posts with few tags and run it through the above, presenting to me a report of what tags are recommended (which could then be coupled with a basic UI that showed me the post and suggested tags, and “approve”/”reject” buttons or similar.

Extracting training data

First, I needed to extract and curate my tag list, for which I used the following SQL4:

SELECT COUNT(wp_term_relationships.object_id) num, wp_terms.slug FROM wp_term_taxonomy
LEFT JOIN wp_terms ON wp_term_taxonomy.term_id = wp_terms.term_id
LEFT JOIN wp_term_relationships ON wp_term_taxonomy.term_taxonomy_id = wp_term_relationships.term_taxonomy_id
WHERE wp_term_taxonomy.taxonomy = 'post_tag'
AND wp_terms.slug NOT IN (
  -- filter out e.g. 'rss-club', 'published-on-gemini', 'dancast' etc.
  -- these are tags that have internal meaning only or are already accurately applied
  'long', 'list', 'of', 'tags', 'the', 'ai', 'should', 'never', 'apply'
)
GROUP BY wp_terms.slug
HAVING num > 2 -- filter down to tags I actually routinely use
ORDER BY wp_terms.slug
Many of my tags are used for internal purposes; e.g. I tag posts published on gemini if they’re to appear on gemini://danq.me/ and dancast if they embed an episode of my podcast. I filtered these out because I never want the AI to suggest applying them.

I took my output and dumped it into a list, and skimmed through to add some clarity to some tags whose purpose might be considered ambiguous, writing my explanation of each in parentheses afterwards. Here’s a part of the list, for example:

Prompt derivation

I used that list as the basis for the system message of my initial prompt:

Suggest topical tags from a predefined list that appropriately apply to the content of a given blog post.

# Steps

1. **Read the Blog Post**: Carefully read through the provided content of the blog post to identify its main themes and topics.
2. **Analyse Key Aspects**: Identify key topics, themes, or subjects discussed in the blog post.
3. **Match with Tags**: Compare these identified topics against the list of available tags.
4. **Select Appropriate Tags**: Choose tags that best represent the main topics and themes of the blog post.

# Output Format

Provide a list of suggested tags. Each tag should be presented as a single string. Multiple tags should be separated by commas.

# Allowed Tags

Tags that can be suggested are as follows. Text in parentheses are not part of the tag but are a description of the kinds of content to which the tag ought to be applied:

- aberdyfi
- aberystwyth
- ...
- youtube
- zoos

# Examples

**Input:**
The rapid advancement of AI technology has had a significant impact on my industry, even on the ways in which I write my blog posts. This post, for example, used AI to help with tagging.

**Output:**
ai, technology, blogging, meta, work

...(other examples)...

# Notes

- Ensure that all suggested tags are relevant to the key themes of the blog post.
- Tags should be selected based on their contextual relevance and not just keyword matching.

This system prompt is somewhat truncated, but you get the idea.

Now I was ready to give it a go with some real data. As an initial simple and short (and therefore also computationally cheap) experiment, I tried feeding it a note I wrote last week about the interrobang’s place in the Spanish language, and in Unicode.

That post already has the following tags (but this wasn’t disclosed to the AI in its training set; it had to work from scratch): , , (a bit of a redundancy there!), , and .

Testing it out

Let’s see what the AI suggests:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_TOKEN" \
  -d '{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "[PROMPT AS DESCRIBED ABOVE]"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "My 8-year-old asked me \"In Spanish, I need to use an upside-down interrobang at the start of the sentence‽\" I assume the answer is yes A little while later, I thought to check whether Unicode defines a codepoint for an inverted interrobang. Yup: ‽ = U+203D, ⸘ = U+2E18. Nice. And yet we dont have codepoints to differentiate between single-bar and double-bar \"cifrão\" dollar signs..."
        }
      ]
    }
  ],
  "response_format": {
    "type": "text"
  },
  "temperature": 1,
  "max_completion_tokens": 2048,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0
}'
Running this via command-line curl meant I quickly ran up against some Bash escaping issues, but set +H and a little massaging of the blog post content seemed to fix it.

GPT-4o-mini

When I ran this query against the gpt-4o-mini model, I got back: unicode, language, education, children, symbols.

That’s… not ideal. I agree with the tags unicode, language, and children, but this isn’t really about education. If I tagged everything vaguely educational on my blog with education, it’d be an even-more-predominant tag than geocaching is! I reserve that tag for things that relate specifically to formal education: but that’s possibly something I could correct for with a parenthetical in my approved tags list.

symbols, though, is way out. Sure, the post could be argued to be something to do with symbols… but symbols isn’t on the approved tag list in the first place! This is a clear hallucination, and that’s pretty suboptimal!

Maybe a beefier model will fare better…

GPT-4o

I switched gpt-4o-mini for gpt-4o in the command above and ran it again. It didn’t take noticeably longer to run, which was pleasing.

The model returned: children, language, unicode, typography. That’s a big improvement. It no longer suggests education, which was off-base, nor symbols, which was a hallucination. But it did suggest typography, which is a… not-unreasonable suggestion.

Neither model suggested spain, and strictly-speaking they were probably right not to. My post isn’t about Spain so much as it’s about Spanish. I don’t have a specific tag for the latter, but I’ve subbed in the former to “connect” the post to ones which are about Spain, but that might not be ideal. Either way: if this is how I’m using the tag then I probably ought to clarify as such in my tag list, or else add a note to the system prompt to explain that I use place names as the tags for posts about the language of those places. (Or else maybe I need to be more-consistent in my tagging).

I experimented with a handful of other well-tagged posts and was moderately-satisfied with the results. Time for a more-challenging trial.

This time, with feeling…

Next, I decided to run the code against a few blog posts that are in need of tags. At this point, I wasn’t quite ready to implement a UI, so I just adapted my little hacky Bash script and copy-pasted HTML-stripped post contents directly into it.

Hand-drawn wireframe application with a blog post shown on the left (with 'previous' and 'next' buttons) and proposed tags on the right (with 'accept' and 'reject' buttons), alongside conventional tag management tools.
If it worked, I decided, I could make a UI. Until then, the command line was plenty sufficient.

I ran against three old posts:

Hospitals (June 2006)

In this post, I shared that my grandmother and my coworker had (independently) been taken into hospital. It had no tags whatsoever.

The AI suggested the tags hospital, family, injury, work, weddings, pub, humour. Which at a glance, is probably a superset of the tags that I’d have considered, but there’s a clear logic to them all.

It clearly picked out weddings based on a throwaway comment I made about a cousin’s wedding, so I disagree with that one: the post isn’t strictly about weddings just because it mentions one.

pub could go either way. It turns out my coworker’s injury occurred at or after a trip to the pub the previous night, and so its relevance is somewhat unknowable from this post in isolation. I think that’s a reasonable suggestion, and a great example of why I’d want any such auto-tagging system to be a human assistant (suggesting candidate tags) and not a fully-automated system. Interesting!

Finally, you might think of humour as being a little bit sarcastic, or maybe overly-laden with schadenfreude. But the blog post explicitly states that my coworker “carefully avoided saying how he’d managed to hurt himself, which implies that it’s something particularly stupid or embarrassing”, before encouraging my friends to speculate on it. However, it turns out that humour isn’t one of my existing tags at all! Boo, hallucinating AI!

I ended up applying all of the AI’s suggestions except weddings and humour. I also applied smartdata, because that’s where I worked (the AI couldn’t have been expected to guess that without context, though!).

Catch-Up: Concerts (June 2005)

This post talked about Ash and I’s travels around the UK to see REM and Green Day in concert5 and to the National Science Museum in London where I discovered that Ash was prejudiced towards… carrot cake.

The AI suggested: concerts, travel, music, preston, london, science museum, blogging.

Those all seemed pretty good at a first glance. Personally, I’d forgotten that we swung by Preston during that particular grand tour until the AI suggested the tag, and then I had to look back at the post more-carefully to double-check! blogging initially seemed like a stretch given that I was only blogging about not having blogged much, but on reflection I think I agree with the robot on this one, because I did explicitly link to a 2002 page that fell off the Internet only a few years ago about the pointlessness of blogging. So I think it counts.

Dan, in his 20s, crouches awkwardly in front of a TV and a Nintendo Wii in a wood-panelled room as he attempts to headbutt a falling blue balloon.
I was able to verify that I’d been in Preston with thanks to this contemporaneous photo. I have no further explanation for the content of the photo, though.

science museum is a big fail though. I don’t use that tag, but I do use the tag museum. So close, but not quite there, AI!

I applied all of its suggestions, after switching museum in place of science museum.

Geeky Winnage With Bluetooth (September 2004)

I wrote this blog post in celebration of having managed to hack together some stuff to help me remote-control my PC from my phone via Bluetooth, which back then used to be a challenge, in the hope that this would streamline pausing, playing, etc. at pizza-distribution-time at Troma Night, a weekly film night I hosted back then.

Four young people, smiling in laughing, sit in a cluttered and messy flat.
If you were sat on that sofa, fighting your way past other people and a mango-chutney-barrel-cum-table to get to a keyboard was genuinely challenging!

It already had the tag technology, which it inherited from a pre-tagging evolution of my blog which used something akin to categories (of which only one could be assigned to a post). In addition to suggesting this, the AI also picked out the following options: bluetooth, geeky, mobile, troma night, dvd, technology, and software.

The big failure here was dvd, which isn’t remotely one of my tags (and probably wouldn’t apply here if it were: this post isn’t about DVDs; it barely even mentions them). Possibly some prompt engineering is required to help ensure that the AI doesn’t make a habit of this “include one tag not from the approved list, every time” trend.

Apart from that it’s a pretty solid list. Annoyingly the AI suggested mobile, which isn’t an approved tag, instead of mobiles, which is. That’s probably a tokenisation fault, but it’s still annoying and a reminder of why even a semi-automated “human-checked” system would need a safety-check to ensure that no absent tags are allowed through to the final stage of approval.

This post!

As a bonus experiment, I tried running my code against a version of this post, but with the information about the AI’s own prompt and the examples removed (to reduce the risk of confusion). It came up with: ai, wordpress, blogging, tags, technology, automation.

All reasonable-sounding choices, and among those I’d made myself… except for tags and automation which, yet again, aren’t among tags that I use. Unless this tendency to hallucinate can be reined-in, I’m guessing that this tool’s going to continue to have some challenges when used on longer posts like this one.

Conclusion and next steps

The bottom line is: yes, this is a job that an AI can assist with, but no, it’s not one that it can do without supervision. The laser-focus with which gpt-4o was able to pick out taggable concepts, faster than I’d have been able to do for the same quantity of text, shows that there’s potential here, but it’s not yet proven itself enough of a time-saver to justify me writing a fluffy UI for it.

However, I might expand on the command-line tools I’ve been using in order to produce a non-interactive list of tagging suggestions, and use that to help inform my work as I tidy up the tags throughout my blog.

You still won’t see any “AI-authored” content on this site (except where it’s for the purpose of talking about AI-generated content, and it’ll always be clearly labelled), and I can’t see that changing any time soon. But I’ll admit that there might be some value in AI-assisted curation and administration, so long as there’s an informed human in the loop at all times.

Footnotes

1 Based on my tagging, I’ve apparently only written about escalators once, while playing Pub Jenga at Robin‘s 21st birthday party. I can’t imagine why I thought it deserved a tag.

2 There are, of course, various other people trying similar approaches to this and similar problems. I might have tried one of them, were it not for the fact that I’m not quite as interested in solving the problem as I am in understanding how one might use an AI to solve the problem. It’s similar to how I don’t enjoy doing puzzles like e.g. sudoku as much as I enjoy writing software that optimises for solving such puzzles. See also, for example, how I beat my children at Mastermind or what the hardest word in Hangman is or my various attempts to avoid doing online jigsaws.

3 Let’s ignore for a moment the farce that was Apple’s attempt to summarise news headlines, shall we?

4 Essentially the same SQL, plus WordClouds.com, was used to produce the word cloud grapic!

5 Two separate concerts, but can you imagine‽ 🤣

× × × ×

Caddy

I’m pretty impressed with running WordPress on Caddy so far.

It took a little jiggerypokery to configure it with an equivalent of the Nginx configuration I use for DanQ.me. But off the back of it I get the capability for HTTP/3, 103 Early Hints, and built-in “batteries included” infrastructure for things like certificate renewal and log rotation.

Browser network debugger showing danq.me being served over protocol 'h3' (HTTP/3) and an 'Early Hints Headers' section loading a WOFF2 font and a JavaScript file.

(why yes, I am celebrating my birthday by doing selfhosting server configuration, why do you ask? 😅)

×

Note #25347

Even when it’s technical, not all of my International Volunteer Day work for Three Rings has been spent using our key technologies (LNMR [Linux, Nginx, MariaDB, Ruby] stacks).

Today, I wrote some extra PHP for our WordPress-powered contact form to notify our Support Team volunteers via Slack when messages are sent. We already aim to respond to every message within 24 hours, 365 days a year, and are often faster than that… but this might help us to be even more-responsive to the needs of the charities who we help look after.

A filled contact form alongside a Slack message and a resulting ticketing system message.

×

Nex in CapsulePress

I’ve added Nex support to CapsulePress!

What does that mean?

Screenshot showing DanQ.me homepage via Nex, in Lagrange browser.
Here’s how nex://danq.me/ looks in my favourite desktop Gemini/smolweb browser Lagrange.

Nex is a lightweight Internet protocol reminiscent to me of Spartan (which CapsulePress also supports), but even more lightweight. Without even affordances like host identification, MIME types, response codes, or the expectation that Gemtext might be supported by the client, it’s perhaps more like Gopher than it is like Gemini.

It comes from the ever-entertaining smolweb hub of Nightfall City, whose Web interface clearly states at the top of every page the command you could have run to see that content over the Nex protocol. Lagrange added support for Nex almost a year ago and it’s such a lightweight protocol that I was quickly able to adapt CapsulePress’s implementation of Spartan to support Nex, too.

require 'gserver'
require 'word_wrap'
require 'word_wrap/core_ext'

class NexServer < GServer
  def initialize
    super(
      (ENV['NEX_PORT'] ? ENV['NEX_PORT'].to_i                           : 1900),
      (ENV['NEX_HOST']                                                 || '0.0.0.0'),
      (ENV['NEX_MAX_CONNECTIONS'] ? ENV['NEX_MAX_CONNECTIONS'].to_i : 4)
    )
  end

  def handle(io, req)
    puts "Nex: handling"
    io.print "\r\n"
    req = '/' if req == ''
    if response = CapsulePress.handle(req, 'nex')
      io.print response[:body].wrap(79)
    else
      io.print "Document not found\r\n"
    end
  end

  def serve(io)
    puts "Nex: client connected"
    req = io.gets.strip
    handle(io, req)
  end
end
This is genuinely the entirety of my implementation of my Nex server, atop CapsulePress. And it’s mostly boilerplate.

Why, you might ask? Well, the reasons are the same as all the other standards supported by CapsulePress:

  1. The smolweb is awesome.
  2. Making WordPress into a CMS things it was never meant to do is sorta my jam.
  3. It was a quick win while I waited for the pharmacist to shoot me up with 5G microchips my ‘flu and Covid boosters.

If you want to add Nex onto your CapsulePress, just git pull the latest version, ensure TCP port 1900 isn’t firewalled, and don’t add USE_NEX=false to your environment. That’s all!

×

What the heck is going on with WordPress?

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Let’s play a little game. 😉

Look at the following list of words and try to find the intruder:

  • wp-activate.php
  • wp-admin
  • wp-blog-header.php
  • wp_commentmeta
  • wp_comments
  • wp-comments-post.php
  • wp-config-sample.php
  • wp-content
  • wp-cron.php
  • wp engine
  • wp-includes
  • wp_jetpack_sync_queue
  • wp_links
  • wp-links-opml.php
  • wp-load.php
  • wp-login.php
  • wp-mail.php
  • wp_options
  • wp_postmeta
  • wp_posts
  • wp-settings.php
  • wp-signup.php
  • wp_term_relationships
  • wp_term_taxonomy
  • wp_termmeta
  • wp_terms
  • wp-trackback.php
  • wp_usermeta
  • wp_users

What are these words?

Well, all the ones that contain an underscore _ are names of the WordPress core database tables. All the ones that contain a dash - are WordPress core file or folder names. The one with a space is a company name…

A smart (if slightly tongue-in-cheek) observation by my colleague Paolo, there. The rest of his article’s cleverer and worth-reading if you’re following the WordPress Drama (but it’s pretty long!).

Transparency, Contribution, and the Future of WordPress

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

The people who make the most money in WordPress are not the people who contribute the most (Matt / Automattic really is one of the exceptions here, as I think we are). And this is a problem. It’s a moral problem. It’s just not equitable.

I agree with Matt about his opinion that a big hosting company such as WPEngine should contribute more. It is the right thing to do. It’s fair. It will make the WordPress community more egalitarian. Otherwise, it will lead to resentment. I’ve experienced that too.

In my opinion, we all should get a say in how we spend those contributions [from companies to WordPress]. I understand that core contributors are very important, but so are the organizers of our (flagship) events, the leadership of hosting companies, etc. We need to find a way to have a group of people who represent the community and the contributing corporations.

Just like in a democracy. Because, after all, isn’t WordPress all about democratizing?

Now I don’t mean to say that Matt should no longer be project leader. I just think that we should more transparently discuss with a “board” of some sorts, about the roadmap and the future of WordPress as many people and companies depend on it. I think this could actually help Matt, as I do understand that it’s very lonely at the top.

With such a group, we could also discuss how to better highlight companies that are contributing and how to encourage others to do so.

Some wise words from Joost de Valk, and it’s worth reading his full post if you’re following the WP Engine drama but would rather be focussing on looking long-term towards a better future for the entire ecosystem.

I don’t know whether Joost’s solution is optimal, but it’s certainly worth considering his ideas if we’re to come up with a new shape for WordPress. It’s good to see that people are thinking about the bigger picture here, than just wherever we find ourselves at the resolution of this disagreement between Matt/Automattic/the WordPress Foundation and WP Engine.

Thinking bigger is admirable. Thinking bigger is optimistic. Thinking bigger is future-facing.

WP Engine’s Three Problems

Duration

Podcast Version

This post is also available as a podcast. Listen here, download for later, or subscribe wherever you consume podcasts.

If you’re active in the WordPress space you’re probably aware that there’s a lot of drama going on right now between (a) WordPress hosting company WP Engine, (b) WordPress hosting company (among quite a few other things) Automattic1, and (c) the WordPress Foundation.

If you’re not aware then, well: do a search across the tech news media to see the latest: any summary I could give you would be out-of-date by the time you read it anyway!

Illustration showing relationships between WordPress and Automattic (licensing trademarks and contributing effort to), between WordPress and WP Engine (the latter profits from the former), and between Automattic and WP Engine (throwing lawsuits at one another).
I tried to draw a better diagram with more of the relevant connections, but it quickly turned into spaghetti.

A declaration of war?

Like others, I’m not sure that the way Matt publicly called-out WP Engine at WCUS was the most-productive way to progress a discussion2.

In particular, I think a lot of the conversation that he kicked off conflates three different aspects of WP Engine’s misbehaviour. That muddies the waters when it comes to having a reasoned conversation about the issue3.

Matt Mullwenweg on stage at WordCamp US 2024, stating how he feels that WP Engine exploits WordPress (to great profit) without contributing back.
I’ve heard Matt speak a number of times, including in person… and I think he did a pretty bad job of expressing the problems with WP Engine during his Q&A at WCUS. In his defence, it sounds like he may have been still trying to negotiate a better way forward until the very second he walked on stage that day.

I don’t think WP Engine is a particularly good company, and I personally wouldn’t use them for WordPress hosting. That’s not a new opinion for me: I wouldn’t have used them last year or the year before, or the year before that either. And I broadly agree with what I think Matt tried to say, although not necessarily with the way he said it or the platform he chose to say it upon.

Misdeeds

As I see it, WP Engine’s potential misdeeds fall into three distinct categories: moral, ethical4, and legal.

Morally: don’t take without giving back

Matt observes that since WP Engine’s acquisition by huge tech-company-investor Silver Lake, WP Engine have made enormous profits from selling WordPress hosting as a service (and nothing else) while making minimal to no contributions back to the open source platform that they depend upon.

If true, and it appears to be, this would violate the principle of reciprocity. If you benefit from somebody else’s effort (and you’re able to) you’re morally-obliged to at least offer to give back in a manner commensurate to your relative level of resources.

Two children sit on a bed: one hands a toy dinosaur to the other.
The principle of reciprocity is a moral staple. This is evidenced by the fact that children (and some nonhuman animals) seem to be able to work it out for themselves from first principles using nothing more than empathy. Companies, however aren’t usually so-capable. Photo courtesy Cotton.

Abuse of this principle is… sadly not-uncommon in business. Or in tech. Or in the world in general. A lightweight example might be the many millions of profitable companies that host atop the Apache HTTP Server without donating a penny to the Apache Foundation. A heavier (and legally-backed) example might be Trump Social’s implementation being based on a modified version of Mastodon’s code: Mastodon’s license requires that their changes are shared publicly… but they don’t do until they’re sent threatening letters reminding them of their obligations.

I feel like it’s fair game to call out companies that act amorally, and encourage people to boycott them, so long as you do so without “punching down”.

Ethically: don’t exploit open source’s liberties as weaknesses

WP Engine also stand accused of altering the open source code that they host in ways that maximise their profit, to the detriment of both their customers and the original authors of that code5.

It’s well established, for example, that WP Engine disable the “revisions” feature of WordPress6. Personally, I don’t feel like this is as big a deal as Matt makes out that it is (I certainly wouldn’t go as far as saying “WP Engine is not WordPress”): it’s pretty commonplace for large hosting companies to tweak the open source software that they host to better fit their architecture and business model.

But I agree that it does make WordPress as-provided by WP Engine significantly less good than would be expected from virtually any other host (most of which, by the way, provide much better value-for-money at any price point).

Fake web screenshot showing turdpress.com, "WordPress... But Shit".
There’s nothing to stop me from registering TurdPress.com and providing a premium WordPress web hosting solution with all the best features disabled: I could even disable exports so that my customers wouldn’t even be able to easily leave my service for greener pastures! There’s nothing stop me… but that wouldn’t make it right7.
It also looks like WP Engine may have made more-nefarious changes, e.g. modifying the referral links in open source code (the thing that earns money for the original authors of that code) so that WP Engine can collect the revenue themselves when they deploy that code to their customers’ sites. That to me feels like it’s clearly into the zone ethical bad practice. Within the open source community, it’s not okay to take somebody’s code, which they were kind enough to release under a liberal license, strip out the bits that provide their income, and redistribute it, even just as a network service8.

Again, I think this is fair game to call out, even if it’s not something that anybody has a right to enforce legally. On which note…

Legally: trademarks have value, don’t steal them

Automattic Inc. has a recognised trademark on WooCommerce, and is the custodian of the WordPress Foundation’s trademark on WordPress. WP Engine are accused of unauthorised use of these trademarks.

Obviously, this is the part of the story you’re going to see the most news media about, because there’s reasonable odds it’ll end up in front of a judge at some point. There’s a good chance that such a case might revolve around WP Engine’s willingness (and encouragement?) to allow their business to be called “WordPress Engine” and to capitalise on any confusion that causes.

Screenshot from the WordPress Foundation's Trademark Policy page, with all but the first line highlighted of the paragraph that reads: The abbreviation “WP” is not covered by the WordPress trademarks, but please don’t use it in a way that confuses people. For example, many people think WP Engine is “WordPress Engine” and officially associated with WordPress, which it’s not. They have never once even donated to the WordPress Foundation, despite making billions of revenue on top of WordPress.
I don’t know how many people spotted this ninja-edit addition to the WordPress Foundation’s Trademark Policy page, but I did.

I’m not going to weigh in on the specifics of the legal case: I Am Not A Lawyer and all that. Naturally I agree with the underlying principle that one should not be allowed to profit off another’s intellectual property, but I’ll leave discussion on whether or not that’s what WP Engine are doing as a conversation for folks with more legal-smarts than I. I’ve certainly known people be confused by WP Engine’s name and branding, though, and think that they must be some kind of “officially-licensed” WordPress host: it happens.

If you’re following all of this drama as it unfolds… just remember to check your sources. There’s a lot of FUD floating around on the Internet right now9.

In summary…

With a reminder that I’m sharing my own opinion here and not that of my employer, here’s my thoughts on the recent WP Engine drama:

  1. WP Engine certainly act in ways that are unethical and immoral and antithetical to the spirit of open source, and those are just a subset of the reasons that I wouldn’t use them as a WordPress host.
  2. Matt Mullenweg calling them out at WordCamp US doesn’t get his point across as well as I think he hoped it might, and probably won’t win him any popularity contests.
  3. I’m not qualified to weigh in on whether or not WP Engine have violated the WordPress Foundation’s trademarks, but I suspect that they’ve benefitted from widespread confusion about their status.

Footnotes

1 I suppose I ought to point out that Automattic is my employer, in case you didn’t know, and point out that my opinions don’t necessarily represent theirs, etc. I’ve been involved with WordPress as an open source project for about four times as long as I’ve had any connection to Automattic, though, and don’t always agree with them, so I’d hope that it’s a given that I’m speaking my own mind!

2 Though like Manu, I don’t think that means that Matt should take the corresponding blog post down: I’m a digital preservationist, as might be evidenced by the unrepresentative-of-me and frankly embarrassing things in the 25-year archives of this blog!

3 Fortunately the documents that the lawyers for both sides have been writing are much clearer and more-specific, but that’s what you pay lawyers for, right?

4 There’s a huge amount of debate about the difference between morality and ethics, but I’m using the definition that means that morality is based on what a social animal might be expected to decide for themselves is right, think e.g. the Golden Rule etc., whereas ethics is the code of conduct expected within a particular community. Take stealing, for example, which covers the spectrum: that you shouldn’t deprive somebody else of something they need, is a moral issue; that we as a society deem such behaviour worthy of exclusion is an ethical one; meanwhile the action of incarcerating burglars is part of our legal framework.

5 Not that nobody’s immune to making ethical mistakes. Not me, not you, not anybody else. I remember when, back in 2005, Matt fucked up by injecting ads into WordPress (which at that point didn’t have a reliable source of funding). But he did the right thing by backpedalling, undoing the harm, and apologising publicly and profusely.

6 WP Engine claim that they disable revisions for performance reasons, but that’s clearly bullshit: it’s pretty obvious to me that this is about making hosting cheaper. Enabling revisions doesn’t have a performance impact on a properly-configured multisite hosting system, and I know this from personal experience of running such things. But it does have a significant impact on how much space you need to allocate to your users, which has cost implications at scale.

7 As an aside: if a court does rule that WP Engine is infringing upon WordPress trademarks and they want a new company name to give their service a fresh start, they’re welcome to TurdPress.

8 I’d argue that it is okay to do so for personal-use though: the difference for me comes when you’re making a profit off of it. It’s interesting to find these edge-cases in my own thinking!

9 A typical Reddit thread is about 25% lies-and-bullshit; but you can double that for a typical thread talking about this topic!

× × × × ×

VaultPress to the Rescue

OMG VaultPress Jetpack Backup to the rescue.

One of the best Internet people drew me a picture and when I replied to it, it got scrambled. 😱

But even though I had to modify core WordPress columns to store drawings, the backup respected that and I was able to restore it.

I used to pay for VaultPress. Nowadays I get it for free as one of the many awesome perks of my job. But I’d probably still pay for it because it’s a lifesaver.

So… I’m A Podcast

Duration

Podcast Version

This post is also available as a podcast. Listen here, download for later, or subscribe wherever you consume podcasts.

Observant readers might have noticed that some of my recent blog posts – like the one about special roads, my idea for pressure-cooking tea, and the one looking at the history of window tax in two countries1 – are also available as podcast.

Podcast cover showing Dan touching his temple and speaking into a microphone, captioned 'a podcast nobody asked for, about things only Dan Q cares about'.

Why?

Like my occasional video content, this isn’t designed to replace any of my blogging: it’s just a different medium for those that might prefer it.

For some stories, I guess that audio might be a better way to find out what I’ve been thinking about. Just like how the vlog version of my post about my favourite video game Easter Egg might be preferable because video as a medium is better suited to demonstrating a computer game, perhaps audio’s the right medium for some of the things I write about, too?

But as much as not, it’s just a continuation of my efforts to explore different media over which a WordPress blog can be delivered2. Also, y’know, my ongoing effort to do what I’m bad at in the hope that I might get better at a wider diversity of skills.

How?

Let’s start by understanding what a “podcast” actually is. It is, in essence, just an RSS feed (something you might have heard me talk about before…) with audio enclosures – basically, “attachments” – on each item. The idea was spearheaded by Dave Winer back in 2001 as a way of subscribing to rich media like audio or videos in such a way that slow Internet connections could pre-download content so you didn’t have to wait for it to buffer.3

Mapping of wp-admin metadata fields to parts of a podcast feed.
Podcasts are pretty simple, even after you’ve bent over backwards to add all of the metadata that Apple Podcasts (formerly iTunes) expects to see. I looked at a couple of WordPress plugins that claimed to be able to do the work for me, but eventually decided it was simple enough to just add some custom metadata fields that could then be included in my feeds and tweak my theme code a little.

Here’s what I had to do to add podcasting capability to my theme:

The tag

I use a post tag, dancast, to represent posts with accompanying podcast content4. This way, I can add all the podcast-specific metadata only if the user requests the feed of that tag, and leave my regular feeds untampered . This means that you don’t get the podcast enclosures in the regular subscription; that might not be what everybody would want, but it suits me to serve podcasts only to people who explicitly ask for them.

It also means that I’m able to use a template, tag-dancast.php, in my theme to generate a customised page for listing podcast episodes.

The feed

Okay, onto the code (which I’ve open-sourced over here). I’ve use a series of standard WordPress hooks to add the functionality I need. The important bits are:

  1. rss2_item – to add the <enclosure>, <itunes:duration>, <itunes:image>, and <itunes:explicit> elements to the feed, when requesting a feed with my nominated tag. Only <enclosure> is strictly required, but appeasing Apple Podcasts is worthwhile too. These are lifted directly from the post metadata.
  2. the_excerpt_rss – I have another piece of post metadata in which I can add a description of the podcast (in practice, a list of chapter times); this hook swaps out the existing excerpt for my custom one in podcast feeds.
  3. rss_enclosure – some podcast syndication platforms and players can’t cope with RSS feeds in which an item has multiple enclosures, so as a safety precaution I strip out any enclosures that WordPress has already added (e.g. the featured image).
  4. the_content_feed – my RSS feed usually contains the full text of every post, because I don’t like feeds that try to force you to go to the original web page5 and I don’t want to impose that on others. But for the podcast feed, the text content of the post is somewhat redundant so I drop it.
  5. rss2_ns – of critical importance of course is adding the relevant namespaces to your XML declaration. I use the itunes namespace, which provides the widest compatibility for specifying metadata, but I also use the newer podcast namespace, which has growing compatibility and provides some modern features, most of which I don’t use except specifying a license. There’s no harm in supporting both.
  6. rss2_head – here’s where I put in the metadata for the podcast as a whole: license, category, type, and so on. Some of these fields are effectively essential for best support.

You’re welcome, of course, to lift any of all of the code for your own purposes. WordPress makes a perfectly reasonable platform for podcasting-alongside-blogging, in my experience.

What?

Finally, there’s the question of what to podcast about.

My intention is to use podcasting as an alternative medium to my traditional blog posts. But not every blog post is suitable for conversion into a podcast! Ones that rely on images (like my post about dithering) aren’t a great choice. Ones that have lots of code that you might like to copy-and-paste are especially unsuitable.

Dan, a microphone in front of him, smiles at the camera.
You’re listening to Radio Dan. 100% Dan, 100% of the time.(Also I suppose you might be able to hear my dog snoring in the background…)

Also: sometimes I just can’t be bothered. It’s already some level of effort to write a blog post; it’s like an extra 25% effort on top of that to record, edit, and upload a podcast version of it.

That’s not nothing, so I’ve tended to reserve podcasts for blog posts that I think have a sort-of eccentric “general interest” vibe to them. When I learn something new and feel the need to write a thousand words about it… that’s the kind of content that makes it into a podcast episode.

Which is why I’ve been calling the endeavour “a podcast nobody asked for, about things only Dan Q cares about”. I’m capable of getting nerdsniped easily and can quickly find my way down a rabbit hole of learning. My podcast is, I guess, just a way of sharing my passion for trivial deep dives with the rest of the world.

My episodes are probably shorter than most podcasts: my longest so far is around fifteen minutes, but my shortest is only two and a half minutes and most are about seven. They’re meant to be a bite-size alternative to reading a post for people who prefer to put things in their ears than into their eyes.

Anyway: if you’re not listening already, you can subscribe from here or in your favourite podcasting app. Or you can just follow my blog as normal and look for a streamable copy of podcasts at the top of selected posts (like this one!).

Footnotes

1 I’ve also retroactively recorded a few older ones. Have a look/listen!

2 As well as Web-based non-textual content like audio (podcasts) and video (vlogs), my blog is wholly or partially available over a variety of more-exotic protocols: did you find me yet on Gemini (gemini://danq.me/), Spartan (spartan://danq.me/), Gopher (gopher://danq.me/), and even Finger (finger://danq.me/, or run e.g. finger blog@danq.me from your command line)? Most of these are powered by my very own tool CapsulePress, and I’m itching to try a few more… how about a WordPress blog that’s accessible over FTP, NNTP, or DNS? I’m not even kidding when I say I’ve got ideas for these…

3 Nowadays, we have specialised media decoder co-processors which reduce the size of media files. But more-importantly, today’s high-speed always-on Internet connections mean that you probably rarely need to make a conscious choice between streaming or downloading.

4 I actually intended to change the tag to podcast when I went-live, but then I forgot, and now I can’t be bothered to change it. It’s only for my convenience, after all!

5 I’m very grateful that my favourite feed reader makes it possible to, for example, use a CSS selector to specify the page content it should pre-download for you! It means I get to spend more time in my feed reader.

× × ×

Draw Me a Comment!

Why must a blog comment be text? Why could it not be… a drawing?1

Red and black might be more traditional ladybird colours, but sometimes all you’ve got is blue.

I started hacking about and playing with a few ideas and now, on selected posts including this one, you can draw me a comment instead of typing one.

Just don’t tell the soup company what I’ve been working on, okay?

I opened the feature, experimentally (in a post available only to RSS subscribers2) the other week, but now you get a go! Also, I’ve open-sourced the whole thing, in case you want to pick it apart.

What are you waiting for: scroll down, and draw me a comment!

Footnotes

1 I totally know the reasons that a blog comment shouldn’t be a drawing; I’m not completely oblivious. Firstly, it’s less-expressive: words are versatile and you can do a lot with them. Secondly, it’s higher-bandwidth: images take up more space, take longer to transmit, and that effect compounds when – like me – you’re tracking animation data too. But the single biggest reason, and I can’t stress this enough, is… the penises. If you invite people to draw pictures on your blog, you’re gonna see a lot of penises. Short penises, long penises, fat penises, thin penises. Penises of every shape and size. Some erect and some flacid. Some intact and some circumcised. Some with hairy balls and some shaved. Many of them urinating or ejaculating. Maybe even a few with smiley faces. And short of some kind of image-categorisation AI thing, you can’t realistically run an anti-spam tool to detect hand-drawn penises.

2 I’ve copied a few of my favourites of their drawings below. Don’t forget to subscribe if you want early access to any weird shit I make.

Tidying WordPress’s HTML

Terence Eden, who’s apparently inspiring several posts this week, recently shared a way to attach a hook to WordPress’s get_the_post_thumbnail() function in order to remove the extraneous “closing mark” from the (self-closing in HTML) <img> element.

By default, WordPress outputs e.g. <img src="..." />, where <img src="..."> would suffice.

It’s an inconsequential difference for most purposes, but apparently it bugs him, so he fixed it… although he went on to observe that he hadn’t managed to successfully tackle all the instances in which WordPress was outputting redundant closing marks.

This is a problem that I’ve already solved here on my blog. My solution’s slightly hacky… but it works!

Source code for a post on DanQ.me, being searched for unnecessary HTML closing tags. No results are found.
There are many things you could say about the HTML produced to make the page you’re reading now. But “it needs fewer />s” isn’t among them.

My Solution: Runing HTMLTidy over WordPress

Tidy is an excellent tool for tiding up HTML! I used to use its predecessor back in the day for all kind of things, but it languished for a few years and struggled with support for modern HTML features. But in 2015 it made a comeback and it’s gone from strength to strength ever since.

I run it on virtually all pages produced by DanQ.me (go on, click “View Source” and see for yourself!), to:

  • Standardise the style of the HTML code and make it easier for humans to read1.
  • Bring old-style emphasis tags like <i>, in my older posts, into a more-modern interpretation, like <em>.
  • Hoist any inline <style> blocks to the <head>, and detect any repeated inline style="..."s to convert to classes.
  • Repair any invalid HTML (browsers do this for you, of course, but doing it server-side makes parsing easier for the browser, which might matter on more-lightweight hardware).

WordPress isn’t really designed to have Tidy bolted onto it, so anything it likely to be a bit of a hack, but here’s my approach:

  1. Install libtidy-dev and build the PHP bindings to it.
    Note that if you don’t do this the code might appear to work, but it won’t actually tidy anything2.
  2. Add a new output buffer to my theme’s header.php3, with a callback function: ob_start('tidy_entire_page').
    Without an corresponding ob_flush or similar, this buffer will close and the function will be called when PHP finishes generating the page.
  3. Define the function tidy_entire_page($buffer)
    Have it instantiate Tidy ($tidy = new tidy) and use $tidy->parseString (with your buffer and Tidy preferences) to tidy the code, then return $tidy.
  4. Ensure that you’re caching the results!
    You don’t want to run this every page load for anonymous users! WP Super Cache on “Expert” mode (with the requisite webserver configuration) might help.

I’ve open-sourced a demonstration that implements a child theme to TwentyTwentyOne to do this: there’s a richer set of instructions in the repo’s readme. If you want, you can run my example in Docker and see for yourself how it works before you commit to trying to integrate it into your own WordPress installation!

Footnotes

1 I miss the days when most websites were handwritten and View Source typically looked nice. It was great to learn from, too, especially in an age before we had DOM debuggers. Today: I can’t justify dropping my use of a CMS, but I can make my code readable.

2 For a few of its extensions, some PHP developer made the interesting choice to fail silently if the required extension is missing. For example: if you don’t have the zip extension enabled you can still use PHP to make ZIP files, but they won’t be compressed. This can cause a great deal of confusion for developers! A similar issue exists with tidy: if it isn’t installed, you can still call all of the methods on it… they just don’t do anything. I can see why this decision might have been made – to make the language as portable as possible in production – but I’d prefer if this were an optional feature, e.g. you had to set try_to_make_do_if_you_are_missing_an_extension=yes in your php.ini to enable it, or if it at least logged that it had done so.

3 My approach probably isn’t suitable for FSE (“block”) themes, sorry.

×

A completely plaintext WordPress Theme

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

This is a silly idea. But it works. I saw Dan Q wondering about plaintext WordPress themes – so I made one.

This is what this blog looks like using it:

Screenshot showing my blog rendered just as text.

I clearly nerdsniped Terence at least a little when I asked whether a blog necessarily had to be HTML, because he went on to implement a WordPress theme that delivers content entirely in plain text.

Naturally, I’ve also shared his accomplishment on my own text/plain blog (which uses a much simpler CMS based on static files).

×

100 Days To Offload

The ever-excellent Kev Quirk in 2020 came up with this challenge: write a blog post on each of 100 consecutive days. He called it #100DaysToOffload, in nominal reference to the “100 days of code” challenge. I was reflecting upon this as I reach this, my 36th consecutive day of blogging and my longest ever “daily streak” (itself a spin-off of my attempt at Bloganuary this year), and my 48th post of the year so far.

Monochrome photograph showing sprinters at the starting line.
I guess I’ve always been more of a sprinter/hurdles blogger than a marathon runner.

Might I meet that challenge? Maybe. But it turns out it’s easier than I thought because Kev revised the rules to require only 100 posts in a calendar year (or any other 365-day period, but I’m not going to start thinking about the maths of that).

That’s not only much more-achievable… I’ve probably already achieved it! Let’s knock out some SQL to check how many posts I made each year:

SELECT
  YEAR(wp_posts.post_date_gmt) yyyy,
  COUNT(wp_posts.ID) total
FROM
  wp_posts
WHERE
  wp_posts.post_status='publish'
  AND wp_posts.post_type='post'
GROUP BY yyyy
ORDER BY yyyy
My code’s actually a little more-complicated than this, because of some plot, but this covers the essentials.

A big question in some years is what counts as a post. Kev’s definition is quite liberal and includes basically-everything, but I wonder if mine shouldn’t perhaps be stricter. For example:

  • Should I count checkins, even though they’re not always born as blog posts but often start as logs on geocaching websites? (My gut says yes!)
  • Do reposts and bookmarks contribute, a significant minority of which are presented without any further interpretation by me? (My gut says no!)
  • Does a vlog version of a blog post count separately, or is it a continuation of the same content? (My gut says the volume is too low to matter!)
  • Can a retroactive achievement (i.e. from before the challenge was announced) count? Kev writes “there is no specific start date”, but it seems a little counter to the idea of it specifically being a challenge to claim it when you weren’t attempting the challenge at the time.
  • And so on…
Year Posts Success? Notes
1998 7 ❌ No Some posts are lost from 1998/1999. If they were recovered I might have made 100 posts in 1999, but probably not in 1998 as I only started blogging on 27 September 1998.
1999 66 ❌ No
2000 2 ❌ No
2001 11 ❌ No
2002 5 ❌ No
2003 189 🏆 Yes Achieved 1 September, with a post about an article on The Register about timewasting. Or, if we allow reposts, three days earlier with a repost about Claire's car being claimed by the sea.
2004 374 🏆 Yes An early win on 20 April, with a made-up Chez Geek card. Or if we allow reposts, two days earlier with thoughts on a confusing pro-life (???) website.
2005 381 🏆 Yes In a highly-productive year of blogging, achieved on 7 April with a post about enjoy curry and public information films with friends. If we allow bookmarks (I was highly-active on del.icio.us at the time!), achieved even earlier on 18 February with some links to curious websites.
2006 206 🏆 Yes On 21 July, I shared a personality test (which was actually my effort to repeat an experiment in using Barnum-Forer statements) - I didn't initially give away that I was the author of the "test". Non-pedants will agree I achieved the goal earlier, on 19 June, with my thoughts on a programming language for a hypothetical infinitely-fast computer.
2007 166 🏆 Yes Achieved on 2 July with thoughts on films I'd watched and board games I'd played recently. Or arguably 12 days earlier with Claire's birthday trip to Manchester.
2008 86 ❌ No
2009 79 ❌ No
2010 159
(84 for pedants)
✅ Yes* A heartfelt post about saying goodbye to Aberystwyth as I moved to Oxford on 16 June was my 100th of the year. Pedants might argue that this year shouldn't count, but so long as you're willing to count checkins (and you should) then it would... and my qualifying post would have come only a couple of days later, with a post about the Headington Shark, which I had just moved-in near to.
2011 177 🏆 Yes Reached the goal on 28 October when I wrote about mild successes in my enquiries with the Office of National Statistics about ensuring that information about polyamorous households was accurately recorded. Or if we earlier on 9 June with a visual gag about REM lyrics if you accept all my geocache logs as posts too (and again: you should).
2012 129
(87 for pedants)
✅ Yes* My 100th post of the year came on 28 August when I wrote about launching a bus named after my recently-deceased father. You have to be willing to accept both checkins and reposts as posts to allow this year to count.
2013 138
(59 for pedants)
😓 Probably not I'm not convined this low-blogging year should count: a clear majority of the posts were geocaching logs, and they weren't always even that verbose (consider this candidate for 100th post of 2013, from 1 October).
2014 335
(22 for pedants)
🙁 Not really Another geocache log heavy, conventional blogpost light year that I'm not convinced should count, evem if the obvious candidate for 100th post would be 18 May's cool article about geocaching like Batman!
2015 205
(18 for pedants)
🙁 Not really Still no, for the same reasons as above.
2016 163
(37 for pedants)
🙁 Not really
2017 301
(42 for pedants)
🙁 Not really
2018 547
(87 for pedants)
✅ Yes* I maintain that checkins should count, even when they're PESOS'd from geocaching sites, so long as they don't make up a majority of the qualifying posts in a year. In which case this year should qualify, with the 100th post being my visit to this well-hidden London pub while on my way to a conference.
2019 387
(86 for pedants)
✅ Yes* Similarly this year, when on 15 August I visited a GNSS calibration point in the San Francisco Bay Area... on the way to another conference!
2020 221
(64 for pedants)
✅ Yes* Barely made it this year (ignoring reposts, of which I did lots), with my 21 December article about a little-known (and under-supported) way to inject CSS using HTTP headers, which I later used to make a web page for which View Souce showed nothing.
2021 190
(57 for pedants)
✅ Yes* A cycle to a nearby geocache was the checkin that made the 100th post of this year, on 27 August.
2022 168
(55 for pedants)
✅ Yes* My efforts to check up on one of my own geocaches on 7 September scored the qualifying spot.
2023 164
(86 for pedants)
✅ Yes* My blogging ramped up again this year, and on 24 August I shared a motivational poster with a funny twist, plus a pun at the intersection between my sexuality and my preferred mode of transport.
2024 436 🏆 Yes Writing at full-tilt, my hundredth post came when I found a geocache near Regents Canal, but pedants who disregard reposts and checkins might instead count my excitement at the Ladybird Web browser as the record-breaker. This year also saw me write my 5,000th post on this blog! Wowza!
2025 51
(39 for pedants)
⌛ Not yet...
Total 5,348 Total count of all the posts.
Doesn't add up? Not all posts feature in one of the years above!

* Pedants might claim this year was not a success for the reasons described above. Make your own mind up.

In any case, I’d argue that I clearly achieved the revised version of the challenge on certainly six, probably fourteen, arguably (depending on how you count posts) as many as nineteen different years since I started blogging in 1998. My least-controversial claims would be:

  1. September 2003, with Timewasting
  2. April 2004, with Chez Geek Card of the Day
  3. April 2005, with Curry with Alec and Suz
  4. July 2006, with Coolest Personality Test I’ve Ever Seen
  5. July 2007, with It’s All Fun and Games
  6. June 2010, with Saying Goodbye
  7. October 2011, with Poly and the Census – Success! (almost)
  8. August 2012, with A Bus Called Peter
  9. June 2018, with Dan Q found GLW6CMKQ 16th Century Pub (Central London) 
  10. August 2019, with Dan Q found GC6KR0H Bay Area Calibration Point #4 – New Technology
  11. December 2020, with The Fourth Way to Inject CSS
  12. August 2021, with Dan Q found GC531M9 Walk by the Firehouse #1
  13. October 2022, with Dan Q performed maintenance for GC9Z37H Friar’s Farm – Woodland Walk
  14. August 2023, with Inclusivity

Given all these unanswered questions, I’m not going to just go ahead and raise a PR against the Hall of Fame! Instead, I’ll leave it to Kev to decide whether I’m (a) eligible to claim a 14-time award, (b) merely eligible for a 4-time award for the years following the challenge starting, or (c) ineligible to claim success until I intentionally post 100 times in a year (in, at current rates, another two months…). Over to you, Kev…

Update: Kev’s agreed that I can claim the most-recent four of them, so I raised a PR.

×