Pushing RSS to WhatsApp (for free?)

I’m always keen to experiment with new ways to subscribe to my blogObviously RSS is the best and everybody who can use it should. But some people, for their own reasons1, prefer to learn about updates to their favourite sites via the Fediverse, or Facebook, or Telegram, or… I don’t know… LiveJournal or something (yes, those are all places you can follow this blog, if you really wanted to).

But I’m pretty sure there are some people who’d rather receive updates to my blog via WhatsApp. And now, they can. Here’s how I set up an RSS-to-WhatsApp gateway, in case you want to run one of your own2.

Diagram showing how GitHub Actions request and receive an RSS feed, push an update to Whapi, which pushes the update to WhatsApp.

Prerequisites

You will need:

  1. A GitHub account – a free one is fine
  2. A Whapi account connected to your WhatsApp account3 – when you set up an account you’ll get a free trial; when it ends you need to find the link to say that you want to carry on with the free tier (or upgrade to the paid tier if you expect to send more messages than the free tier’s limit)
  3. A WhatsApp channel to which you want to push your RSS feed: I’d recommend that you make a newsletter (from the Updates tab in WhatsApp, press the kekab menu then Create Channel) rather than a traditional group: groups are designed for multiple people to talk and discuss and everybody can see one another’s identity, but a newsletter keeps everybody’s identity private and only allows the administrator(s) permission to post updates.
A WhatsApp 'Subscription Channel'/'Newsletter' group for DanQ.me, showing some recent posts, on a mobile screen.
You probably want to use the kind of channel that’s for one-to-many ‘push’ communication, not a discussion group.

Instructions

  1. Fork the Dan-Q/rss-to-whapi.cloud repository into your own GitHub account.
  2. In Settings > Secrets and Variables > Actions, add two new Repository Secrets:
    • WHATSAPP_API_TOKEN: set to the token on your Whapi dashboard
    • WHATSAPP_CHANNEL: set to your newsletter ID (will look like 123456789012345678@newsletter) or group ID (will look like 123456789012345678@g.us): you can get this from the Newsletters or Groups section of Whapi by executing a test GET /newsletters or GET /groups request4.
  3. Make a feeds.json file (a feeds.json.example is provided as a guide) containing the URLs of the RSS feeds you’d like to subscribe to.
  4. Do a test run: from the Actions tab select the “Process feeds” action and click “Run workflow”. If it finishes successfully (and you get the WhatsApp message), you’re done! If it fails, click on the failed action and drill-in to the failed task to see the error message and correct accordingly.

By default, the processor will run on-demand and every 30 minutes, but you can modify that in .github/workflows/process-feeds.yml. It’s configured to send the single oldest un-sent item in any of the RSS feeds it’s subscribed to, on each run (it tracks which ones it’s sent already by their guids, in a "seen": [...] array in feeds.json): sending a single link per run ensures that WhatsApp’s link previews work as expected. At that rate, you could theoretically run it once every 10 minutes and never hit the 150-messages-per-day limit of Whapi’s free tier5), but you’ll want to work out your own optimal rate based on the anticipated update frequency of your feeds and the number of RSS-to-WhatsApp channels you’re running.

You can, of course, run it on your own infrastructure in a similar way. Just check out the repository to your local system with Ruby 3.2+ running, run bundle to install the dependencies, then set up a cron job or some other automation to run ./process_feeds.rb. Doing this could be used to hook it up to your RSS feed updating pipeline, for example, to check for new feed items right after a new post is published.

Footnotes

1 Their own incomprehensible, illogical, weird reasons.

2 I hope that the title gives it away, but you can do this completely for free. So long as you keep your fork of the GitHub repository open-source then you can run GitHub Actions for free, and so long as you’re pushing out no more than 150 updates per day to no more than 5 different channels in a month then you can do it within Whapi’s free tier: that’s probably fine for a personal blogger, and there’s a reasonable pricing structure (plus some value-added extras) for companies that want to use this same workflow as part of a grander WhatsApp offering.

3 Setting this up requires giving Whapi access to your WhatsApp account. If you don’t like the security implications of that, you could get a cheap eSIM, set that up with WhatsApp, and use that account: if you do this, just remember to “warm up” your new WhatsApp account with some conversations with yourself so it doesn’t look so much like a spammer! Also note that the way Whapi works “uses up” one of the ~4 devices on which you can simultaneously use WhatsApp Web/WhatsApp Desktop etc.

4 Prefer the command-line? So long as you’ve got curl and jq then you can get a list of your newsletters (or groups) and their IDs with curl -H 'Authorization: Bearer YOUR_API_TOKEN' -H 'accept: application/json' https://gate.whapi.cloud/newsletters?count=100 | jq '.newsletters[] | { id: .id, name: .name }' or curl -H 'Authorization: Bearer YOUR_API_TOKEN' -H 'accept: application/json' https://gate.whapi.cloud/groups?count=100 | jq '.groups[] | { id: .id, name: .name }', respectively.

5 Going beyond the free tier would require sending one message, on average, every 9 minutes and 36 seconds.

Postcards… from the Internet!

A couple of weeks ago I blogged about setting up a PO Box and adding postal mail to the ways you can contact me. I went for a “pay as you go” PO Box because I didn’t know if anybody would actually use it, but I’ve already received two delightful postcards and I couldn’t be more thrilled.

Postcard reading: Just wanted to say thank you for dropping me a note about my RSS feed! I've been wondering why it was so unreliable for months - a rare treat to have somebody hand me the solution! Thanks also for the kind words about my website. :)

The first postcard came from Florence, whom I’d met on a forum where I’d helped them repair a troublesome RSS feed1.

The second postcard was from Rhys, whose guestbook I dropped a comment into after spotting that his 50 Before I’m 50 list contained a wish to learn a “genuinely cool magic trick”, for which I had a suggestion2.

Postcard reading: Returning a comment you left on my personal site, from my days of Twitch streaming I used to send out parcels to competition winners with a postcard. Anyway, here you go!

The PO Box worked very well: I’m using UK Postbox principally because of their “pay as you go” rate (with a free tier in case you don’t receive any mail at all, which I figured was a risk) but I was later pleased to discover they’re a nice company in other ways, too. They scan the outside/one side of my mail as it arrives and I can optionally pay to scan the whole thing and/or to bundle and forward it on to me3.

I’ve started a new page to collect all the cards, including a (hopefully pretty-accessible) CSS-powered interactive “flipper” so you can turn them over, and I’m hopeful that I might attract a few more as time goes on. Getting physical mail from “Internet friends” helps make the digital world feel a little bit smaller, and I love it.

(If you’d like to send me a postcard too, I’d be so very grateful!)

Footnotes

1 Florence’s RSS feed was missing a <![CDATA[ ... ]]> block around some embedded HTML, which was causing the HTML to be evaluated “as if” it were XML, which – not being XHTML – it failed to do.

2 My suggestion was a variation of Derek Dingle’s Too Many Cards that I’ve been performing all over the place: it’s an immensely satisfying trick to perform, requiring a challenging but achievable set of sleights and suitable to do without preparation and using a borrowed deck, which is pretty much the gold standard in card magic.

3 I’ve opted to have it forwarded: I’m wondering if I can combine all the postcards I get into a single poster frame or something: maybe a double-sided one so the whole thing can be flipped to show the text, not just the fronts?

Send Me a Postcard!

Last month I was on em’s personal site, where I  discovered their contact page lists not only the usual methods (email addresses, socials, contact forms etc.) but also a postal address1: how cool is that‽ I could have written in their guestbook… but obviously I took the option to send a postcard instead!

Now I’ve set up a PO Box of my own, and I’ve love it if you feel up to saying “hi” via a postcard2. As a bonus, it’s more-likely to get through than anything that has to face-off against my spam filter!

So, if you want to send me a letter or postcard (no parcels, nothing that needs a signature), my address is:

Dan Q
Unit 159610
PO Box 7169
Poole
BH15 9EL
United Kingdom

The usual other contact methods still work, of course.

Footnotes

1 The postal address em uses is a PO Box, for solid safety/anonymity reasons, and/or perhaps to facilitate house moves.

2 I’ve only ever received unsolicited postal mail from Internet acquaintances once, I think – a hashcard from a fellow geohasher called Fippe – and it nowadays sits proudly on my wall.

Do Contact Forms Attract More Spam than Email Addresses?

There’s a question being floated around my corner of the blogosphere, but I think my experience of the answer differs from other bloggers:

It started when David Bushell observed that, despite having his email address unobscured on his website, he gets more spam via his contact form. Luke Harris followed-up, providing a potential explanation which basically boils down to the idea that it’s both more cost-effective and provides better return-on-investment to spam contact forms than email addresses. And then Kev Quirk described his experience of switching from contact forms to “bare” email addresses and the protections he put in place (like plus-addressing), only to discover that he didn’t need it at all.

Disappearing Contact Forms

It makes me sad to see the gradual disappearance of the contact form from personal websites. They generally feel more convenient than email addresses, although this is perhaps part of the reason that they come under attack from spammers in the first place! But also, they provide the potential for a new and different medium: the comments area (and its outdated-but-beautiful cousin the guestbook).

Comments are, of course, an even more-obvious target for spammers because they can result in immediate feedback and additional readers for your message. Plus – if they’re allowed to contain hyperlinks – a way of leeching some of the reputability off a legitimate site and redirecting it to the spammers’, in the eyes of search engines. Boo!

A DanQ.me comment form pre-filled with a diversity of spam tropes, by 'Spammer McSpamface'.
Well this was painful to write.

But I’ve got to admit: there have been many times that I’ve read an interesting article and not interacted with it simply because the bar to interaction (what… I have to open my email client!?) was too high. I’d prefer to write a response on my blog and hope that webmention/pingback/trackback do their thing, but will they? I don’t know in advance, unless the other party says so openly or I take a dive into their source code to check.

Your Experience May Vary

I’ve had both contact/comment forms and exposed email addresses on my website for many years… and I feel like I get aproximately the same amount of spam on both, after filtering. The vast majority of it gets “caught”. Here’s what works for me:

My contact/comments forms use one of a variety of unobtrustive “honeypot”-style traps. These “reverse CAPTCHAs” attempt to trick bots into interacting with them in some particular way while not inconveniencing humans.

  • Antispam Bee provides the first line of defence, but I’ve got a few tweaks of my own to help counteract the efforts of determined spammers.
  • Once you’ve fallen into a honeypot it becomes much easier to block subsequent contacts with the same/similar content, address, (short-term) IP, or the poisoned cookie you’re given.
  • Keyword filtering provides a further line of defence. E.g. for contact forms that post directly back to the Web (i.e. comment forms, and perhaps a future guestbook form), content with links goes into a moderation queue unless it shares a sender email with a previously-approved sender. For contact forms that result in an email, I’ve just got a few “scorer” rules relating to geo IP, keywords, number and density of links, etc. that catch the most-insidious of spam to somehow slip through.

also publish email addresses all over the place, but they’re content-specific. Like Kev, I anticipated spam and so use unique email addresses on different pieces of content: if you want to reply-by-email to this post, for example, you’re encouraged to use the address b27404@danq.me. But this approach has actually provided secondary benefits that are more-valuable:

  • The “scrapers” that spam me by email would routinely send email to multiple different @danq.me addresses at the same time. Humans don’t send the same identical message to me to different addresses published on my site and from different senders, so my spam filter picks up on this rightaway.
  • As a fringe benefit, this helps me determine the topic on an email where it’s unclear. E.g. I’ve had humans email me to say “I tried to follow the guide on your page but it didn’t work for me” and I wouldn’t have had a clue which page had they not reached out via a page-specific email alias.
  • I enjoy the potential offered by rotating the email address generation mechanism and later treating all previously-exposed addresses as email honeypots.
An email spam inbox. A significant number of detected spam messages have the subject line "PAY OR BE EXPOSED" but have different senders.
They’ve all got different “sender” addresses, but that fact that this series of emails were identical except for the different recipient aliases meant that catching them was very easy for my spam filters.

Works For Me!

This strategy works for me: I get virtually no comment/contact form spam (though I do occasionally get a false positive and a human gets blocked as-if they were a robot), and very little email spam (after my regular email filters have done their job, although again I sometimes get false positives, often where humans choose their subject lines poorly).

It might sound like my approach is complicated, but it’s really not. Adding a contact form honeypot is not significantly more-difficult than exposing automatically-rotating email aliases, and for me it’s worth it: I love the convenience and ease-of-use of a good contact/comments form, and want to make that available to my visitors too!

(I also allow one-click reactions with emoji: did you see? Scroll down and send me a bumblebee! Nobody seems to have found a way to spam me with these, yet: it’s not a very expressive medium, I guess!)

× ×

Note #27196

Today, for the first time ever, I simultaneously published a piece of content across five different media: a Weblog post, a video essay, a podcast episode, a Gemlog post, and a Spartanlog post.

Must be about something important, right?

Nope, it’s a meandering journey to coming up with a design for a £5 coin that will never exist. Delightfully pointless. Being the Internet I want to see in the world.

🔗 https://danq.me/the-beer-token

Work Slippers

Duration

Podcast Version

This post is also available as a podcast. Listen here, download for later, or subscribe wherever you consume podcasts.

Last month my pest of a dog destroyed my slippers, and it was more-disruptive to my life than I would have anticipated.

A French Bulldog looks-on guiltily at a hand holding the remains of a pair of slippers that have been thoroughly shredded.
Look what you did, you troublemaker.

Sure, they were just a pair of slippers1, but they’d become part of my routine, and their absence had an impact.

Routines are important, and that’s especially true when you work from home. After I first moved to Oxford and started doing entirely remote work for the first time, I found the transition challenging2. To feel more “normal”, I introduced an artificial “commute” into my day: going out of my front door and walking around the block in the morning, and then doing the same thing in reverse in the evening.

A mixture of flatscreen and CRT monitors, plus a laptop and a webcam, on a desk. The laptop screen shots a view of an office at the "other end" of a webcam connection.
My original remote working office, circa 2010.

It turns out that in the 2020s my slippers had come to serve a similar purpose – “bookending” my day – as my artificial commute had over a decade earlier. I’d slip them on when I was at my desk and working, and slide them off when my workday was done. With my “work” desk being literally the same space as my “not work” desk, the slippers were a psychological reminder of which “mode” I was in. People talk about putting on “hats” as a metaphor for different roles and personas they hold, but for me… the distinction was literal footwear.

And so after a furry little monster (who for various reasons hadn’t had her customary walk yet that day and was probably feeling a little frustrated) destroyed my slippers… it actually tripped me up3. I’d be doing something work-related and my feet would go wandering, of their own accord, to try to find their comfortable slip-ons, and when they failed, my brain would be briefly tricked into glancing down to look for them, momentarily breaking my flow. Or I’d be distracted by something non-work-related and fail to get back into the zone without the warm, toe-hugging reminder of what I should be doing.

It wasn’t a huge impact. But it wasn’t nothing either.

A pair of brown slippers, being worn, in front of a French bulldog asleep in her basket, her tongue sticking out.
The bleppy little beast hasn’t expressed an interest in my replacement slippers, yet. Probably because they’re still acquiring the smell of my feet, which I’m guessing is what interested her in the first place.

So I got myself a new pair of slippers. They’re a different design, and I’m not so keen on the lack of an enclosed heel, but they solved the productivity and focus problem I was facing. It’s strange how such a little thing can have such a big impact.

Oh! And d’ya know what? This is my hundredth blog post of the year so far! Coming on only the 73rd day of the year, this is my fastest run at #100DaysToOffload yet (my previous best was last year, when I managed the same on 22 April). 73 is exactly a fifth of 365, so… I guess I’m on track for a mammoth 500 posts this year? Which would be my second-busiest blogging year ever, after 2018. Let’s see how I get on…4

Footnotes

1 They were actually quite a nice pair of slippers. JTA got them for me as a gift a few years back, and they lived either on my feet or under my desk ever since.

2 I was working remotely for a company where everybody else was working in-person. That kind of hybrid setup is a lot harder to do “right”, as many companies in this post-Covid-lockdowns age have discovered, and it’s understandable that I found it somewhat isolating. I’m glad to say that the experience of working for my current employer – who are entirely distributed – is much more-supportive.

3 Figuratively, not literally. Although I would probably have literally tripped over had I tried to wear the tattered remains of my shredded slippers!

4 Back when I did the Blog Questions Challenge I looked at my trajectory and estimated I wouldn’t hit a hundred this year until a week later than now, so maybe I’m… accelerating?

× × ×

Engagement

I’ve been trying to comment more on other people’s blogs. It’s tough, because comment forms continue to wane in popularity, and it’s not always clear who’ll accept Webmentions, but there’s often the option of a good old-fashioned email or a fediverse ping.

It occurred to me that I follow a significant number of personal blogs, and my privacy systems mean I’m a bit of a ghost to most analytics systems they might use, so the only way they’d ever know I was there would be if I said so.

Plus, the Internet is better when it’s social. There are some great people out there, and I’m enjoying meeting them!

(You’re welcome to throw comments, Webmentions, or emails my way, of course, too!)

Reply to Blogging for traffic not design

This is a reply to a post published elsewhere. Its content might be duplicated as a traditional comment at the original source.

Andy Hawthorne said:

When you’re writing online, being unique doesn’t matter nearly as much as being found.

I’m not sure I could disagree more. But I’ve jumped in half way through his post. Let’s backtrack a bit.

Andy begins:

A blogger showed me his website the other day.

But no one was reading it.

Firstly: let’s just observe that you were shown a website… and now you’re talking about it… but you haven’t linked to it? You’re complaining about its lack of discoverability, while simultaneously being part of the problem.

Hyperlinks remain, as they have been since the mid-to-late 1990s, a primary mechanism in helping search engines’ spiders to discover new sites, and nowadays they’re doubly-important because they help establish legitimacy.

When you search for, say, “history of web search” and this Wikipedia article is at the top, a significant reason for that is that people link to that page when talking about the history of web search! A secondary reason is that lots of people link to Wikipedia in general.

DuckDuckGo search for 'history of web search', showing 'Timeline of web search engines - Wikipedia' as the top result.
Your mileage may vary depending on your preferred search engine and other factors.

Berating somebody for an unindexed site… but not linking to that site… feels awfully-close to victim-blaming!

(Especially recently, as still-dominant search engine Google continues to make it harder and harder for “new” sites to get onto the ladder.)

When I asked him why he didn’t just use WordPress or Bear Blog, he looked offended.

“Those are so basic. Everyone uses those. I wanted something unique.”

I’m not sure I understand the logic of the person whose argument against e.g. WordPress is that it’s not “unique”. There are lots of great reasons that you might use WordPress. There are lots of great reasons that you might not. The right choice of CMS should be based on a variety of factors.

It’s possible that the person being referred to meant “customisable”. They’d still be wrong (in the case of WordPress, at least: Bear Blog offers significantly less customisation options, which is fine if the other features are what you’re looking for), but anyway: the short of it is that I briefly agreed, here, until:

WordPress powers about 43% of all websites. That means search engines know exactly how to read WordPress sites.

They know where to look for the content, the metadata, the tags.

Let’s correct the points here:

  • Search engines know exactly how to read HTML. WordPress outputs HTML. (If you’re outputting HTML, your site can be indexed. Hell, even that isn’t a firm requirement: my plaintext-only blog shows up in search engines!)
  • Web standards dictate how content, metadata, and tags should be laid out. A search engine’s spider doesn’t look at your site and go “hey, it’s WordPress, so I need to look for this“. Instead, it’ll generally look for content and metadata based on established standards. Titles, headings, <meta> tags, semantic elements: these are the things a search engine looks for.
  • Sure, WordPress gets those things right. But they’re not hard to get right. You shouldn’t use WordPress (or Bear, or anything else) based just on the fact that it exposes metadata correctly. Any site can do this. And because what’s eventually exposed to the search engine – and to the user – is HTML code… which is independent of the CMS that generated it… it doesn’t have to matter what the underlying CMS is.

Then there’s some more confusion:

Here’s what matters: WordPress and other major platforms have spent years optimising for search engines and social sharing.

They’ve spent millions making sure posts load fast.

This sounds like it’s conflating WordPress (the open-source CMS) with one or more of several WordPress hosting providers (probably WordPress.com). That’s a common mistake, but it is a mistake.

WordPress can do terrible SEO. WordPress can be really slow. Trust me: in a previous life I’ve made a part of my living out of fixing and improving people’s WordPress-powered websites! A large part of this comes from WordPress’s flexibility: the theme you choose, for example, can completely change the functionality of your site. Inspired by my plain text blog, Terence Eden made a WordPress theme that does the same thing! That WordPress theme completely upends the way that most people would use WordPress, but it’s still fundamentally WordPress, even though it exposes to search engines no HTML code, no metadata, and no tags.

WordPress can also do great SEO, and it can be really fast. A properly-configured WordPress site can be a well-oiled machine. But if you conflate WordPress itself with its output, you’re arguing against a straw man.

Don’t get me wrong: I love WordPress! But I dislike people making the false claim that if you’re not using it (or another popular blogging tool), you’re destined to fail at SEO. There’s nothing “magical” about WordPress. It just takes content and renders HTML, in the end!

But all of this is moot, perhaps, when we get back to that first point:

When you’re writing online, being unique doesn’t matter nearly as much as being found.

This entire statement presupposes the purpose of “writing online”.

It’s 100% okay to write for yourself, first and foremost. It’s also okay to write for a small target audience, like for your friends or family. It’s okay to write content that isn’t exposed to search engines (consider all of the wonderful content that my fellow RSS Club members put out, sometimes!). It’s okay to write just for the joy of making things.

A website doesn’t have to be “professional”, as Andy’s post goes on to imply. A website doesn’t have to be anything in particular. A website can just… be. And that’s enough.

×

WordPress to ClassicPress

As I mentioned in my recent Blog Questions Challenge, I recently switched my blog from WordPress, which it had been running on for over 20 years of its 26 year history, to ClassicPress.1 I’m aware that I’m not the only person for whom ClassicPress might be a better fit than WordPress2, so I figured I should share the process by which I undertook the change.

Switching from WordPress to ClassicPress

Switching from WordPress to ClassicPress should be a non-destructive, 100% reversible process, but (even though I’ve got solid backups) I wasn’t ready to trust that, so I decided to operate on a copy of my site. I’m glad I did, because there were a couple of teething issues I needed to tackle before I could launch.

1. Duplicating the site

I took a simple approach to duplicating the site: (1) I copied the site directory, and (2) I copied the database, and (3) I set up a new subdomain to use for testing. Here’s how I did each step:

1.1. Copying the site directory

This should’ve been simple, but a du -sh revealed that my /wp-content/uploads directory is massive (I should look into that) and I didn’t want to clone it. And I didn’t want r need to clone my /wp-content/cache directory either. So I ran:

  1. rsync -av --exclude=wp-content ./old-site-directory/ ./new-site-directory/ to copy everything except wp-content, and then
  2. rsync -av --exclude=uploads --exclude=cache ./old-site-directory/wp-content/ ./new-site-directory/wp-content/ to copy wp-content except the uploads and cache subdirectories, and then finally
  3. ln -s ./old-site-directory/wp-content/uploads ./new-site-directory/wp-content/uploads to symlink the uploads directory, sharing it between the two sites

1.2. Copying the database

I just piped mysqldump into mysql to clone from one database to the other:

mysqldump -uUSERNAME -p --lock-tables=false old-site-database | mysql -uUSERNAME -p new-site-database

I edited DB_NAME in wp-config.php in the new site’s directory to point it at the new database.

Screenshot from nano, editing wp-config.php. The constant defintions for DB_NAME, DB_USER, and DB_PASSWORD are highlighted with the text 'change these!'.
If you’re going to clone your WordPress site before converting to ClassicPress, you’ll want to be comfortable editing your wp-config.php.

1.3. Setting up a new subdomain

My DNS is already configured with a wildcard to point (almost) all *.danq.me subdomains to this server already. I decided to use the name classicpress-testing.danq.me as my temporary/test domain name. To keep any “changes” to my cloned site to a minimum, I overrode the domain name in my wp-config.php rather than in my database, by adding the following lines:

define('WP_HOME','https://classicpress-testing.danq.me');
define('WP_SITEURL','https://classicpress-testing.danq.me');

Because I use Caddy/FrankenPHP as my webserver3, configuration was really easy: I just copied the relevant part of my Caddyfile (actually an include), changed the domain name and the root, and it just worked, even provisioning me out a LetsEncrypt SSL certificate. Magical4.

2. Switching the duplicate to ClassicPress

Now that I had a duplicate copy of my blog running at https://classicpress-testing.danq.me/, it was time to switch it to ClassicPress. I started by switching my wp-admin colour scheme to a different one in my cloned site, so it’d be immediately visually-obvious to me if I’d accidentally switched and was editing the “wrong” site (I also made sure I was logged-out of my primary, live site, so I was confident I wouldn’t break anything while I was experimenting!).

ClassicPress provides a migration plugin which checks for common problems and then switches your site from WordPress to ClassicPress, so I installed it and ran it. It said that everything was okay except for my (custom) theme and a my self-built plugins, which it understandably couldn’t check compatibility of. It recommended that I install Twenty Seventeen – the last WordPress default theme to not require the block editor – but I didn’t do so: I was confident that my theme would work anyway… and if it didn’t, I’d want to fix it rather than switch theme!

ClassicPress migration plugin showing a series of green checks: everything's good to go.
I failed to take a screenshot of the actual process, but it looked broadly like this.

And then… it all broke.

3. Fixing what broke

After swiftly doing a safety-check that my live site was still intact, I started trying to work out why my site wasn’t broken. Debugging a ClassicPress PHP issue is functionally identical to debugging a similar WordPress issue, for obvious reasons: check the logs, work out what’s broken, realise it’s a plugin, disable that plugin while you investigate further, etc.

ClassicPress reporting 'There has been a critical error on this website.'
Yeah, I should have expected this. And I did.

In my case, the “blocking” issues were:

  • Jetpack: this plugin does not play nicely with ClassicPress, presumably because it fails if it’s unable to register a block. Fortunately, I wasn’t actually using Jetpack for anything other than for VaultPress (which has saved my butt on at least one occasion and whose praises I sing), so I uninstalled Jetpack and installed the standalone plugin version of VaultPress instead, which worked fine.
  • EWWW Image Optimizer: I use this plugin to pregenerate WebP variants of my images, which I then serve using webserver rules. It’s not a complex job, and I should probably integrate the feature into my theme at some point, but for now I use this plugin. Version 8.0.0 of the plugin doesn’t work on ClassicPress 2.3.1, so I used WP-CLI to downgrade to the last version that does (7.7.0), and then it worked fine.
  • Dan’s Geocaching Log Reposter: a self-made plugin that copies my logs from geocaching websites stopped working properly, which I think is because ClassicPress is doing a more-aggressive job than WordPress at nonce validation on admin REST endpoints? I put a quick hack into my plugin to work around it, but I’ll need to look into this properly at some point.
  • Some other bits of my stack, e.g. CapsulePress (my Gemini/Spartan/Nex server), have their own copies of my database credentials, because I’ve been too lazy to centralise them into environment variables, and needed updating (but not until live switchover time).

Everything else worked fine, as far as I’ve determined in the weeks that have followed. My other plugins, which include Advanced Editor Tools (I should probably look into Enriched Editor), Antispam Bee, Google Authenticator, IndieAuth, Post Kinds, Post Snippets, Regenerate Thumbnails, Syndication Links, Webmention, WebSub, and WP-SCSS all “just worked”.

4. Completing the switchover

I ran the two sites in-parallel for a couple of weeks, with the ClassicPress one as a “read only” version (so I didn’t pollute my uploads directory!), but it was pretty unnecessary because it all worked pretty seamlessly, despite my complex stack of custom code. When I wanted to switch for-real, all I needed to do was swap the domain names over in my Caddyfile and edit the wp-config.php of my ClassicPress installation: step 1.3, but in reverse!

If you hadn’t been told5, you probably wouldn’t have even known I’d made a change: I suppress basically all infrastructure-identifying headers from my server output as a matter of course, and ClassicPress and WordPress are functionally-interchangeable from a front-end perspective6.

So what’s difference?

From my experience, here are the differences I’ve discovered since switching from WordPress to ClassicPress:

The good stuff

  • 😅 ClassicPress has no Gutenberg/block editor. This would absolutely be a showstopper for many people, and that’s fine: I have nothing against the block editor (I use it basically every day elsewhere!), but I’ve never really used it on danq.me and don’t feel the need to change that! My theme, my workflow, and my custom plugins are all geared around the perfectly-good “classic” editor, and so getting a more-lightweight CMS by removing a feature I wasn’t using anyway falls somewhere between neutral and a blessing.
  • The backend is fast again! One of the changes the ClassicPress team have been working on applying to WordPress is to strip out jQuery and other redundancies from the backend, and I love how much faster and lighter my editor interface is as a result. (With caveat; see below!)
  • 🔌Virtually everything “just works”. With the few exceptions described above, everything works exactly as it does under WordPress. Which is what you’d hope for a fork that’s mostly “WordPress, but without the block editor”, right, but it’s still reassuring (and, for me, an essential feature). There are a few “new” features to do with paging through posts and the media library and they’re fine, I suppose, but not by themselves worth switching for (though it might be nice to backport them into WordPress!).

The bad stuff

  • 🏷️ Adding tags to posts takes a step backwards. A side-effect of dropping jQuery is the partial loss of the autocomplete feature when selecting tags to add to a post. You still get a partial autocomplete, but not after typing a comma: you need to press enter to submit the tag you were writing and then start typing them next, which frankly sucks. This is because they’re relying on a <datalist>, which isn’t as full-featured as the Javascript solution WordPress employs. This bugs me almost enough to be a showstopper, but I gather it’s getting fixed in a near-future version.
  • 🗺️ You’re in uncharted territory when things go wrong. One great benefit of WordPress is the side-effects of its ubiquity. If you have a query or a problem you can throw a stone at your favourite search engine and get a million answers… and some of them will even be right! If you have a problem in ClassicPress and it’s not shared with (or you’re not sure if it’s shared with) WordPress… you’re mostly on your own. The forums are good and friendly, but if you want a quick answer to something, you’re likely to have to roll your sleeves up and open some source code. I don’t mind this at all – when I first started using WordPress, this was the case, too! – but it might be a showstopper for some folks.

In summary: I’m enjoying using ClassicPress, even where there are rough edges. For me, 99% of my experience with it is identical to how I used WordPress anyway, it’s relatively lightweight and fast, and it’s easy enough to switch back if I change my mind.

Footnotes

1 It saddens me that I have to keep clarifying this, but I feel like I do: my switch from WordPress to ClassicPress is absolutely nothing to do with any drama in the WordPress space that’s going on right now: in fact, I’d been planning to try it out since before any of the drama appeared. I appreciate that some people making a similar switch, including folks who use this blog post as a guide, might have different motivations to me, and that’s fine too. Personally, I think that ditching an installation of open-source WordPress based on your interpretation of what’s going on in the ecosystem is… short-sighted? But hey: the joy of open source is you can – and should! – do what you want. Anyway: the short of it is – the desire to change from WordPress to ClassicPress was, for me, 100% a technical decision and 0% a political one. And I’ll thank you for leaving any of your drama at the door if you slide into my comments, ta!

2 Matt recently described ClassicPress as “the last decent fork attempt for WordPress”, and I absolutely agree. There’s been a spate of forks and reimplementations recently. I’ve looked into many of them and been… very much underwhelmed. Want my hot take? Sure, here you go: AspirePress is all lofty ideas and no deliverables. FreeWP seems to be the same, but somehow without the lofty ideas. ForkPress is a ghost. Speaking of ghosts, Ghost isn’t a WordPress fork; they have got some cool ideas though. b2evolution is even less a WordPress fork but it’s pretty cool in its own right. I’m not sure what clamPress is trying to achieve but I’ve not given it a serious look. So yeah: ClassicPress is, in my mind, the only WordPress fork even worth consideration at this point, and as I describe in this blog post: it’s not for everybody.

3 I switched from Nginx over the winter and it’s been just magical: I really love Caddy’s minimal approach to production configuration. The only thing I’ve been able to fault it on is that it’s not capable of setting up client-side SSL certificate authentication on a path, only on an entire domain, which meant I needed to reimplement the authentication mechanism I use on a small part of my (non-blog) internal infrastructure.

4 To be fair, it wouldn’t have been hard if I’d still be using Nginx, because I’d set up Certbot to use DNS-based vertification to issue me wildcard SSL certificates. But doing this in Caddy still felt magical.

5 And assuming you don’t religiously check my colophon page.

6 Indeed, I wouldn’t have considered a switch to ClassicPress in the first place if it wasn’t a closely-aligned-enough fork that I retained the ability to flip-flop between the two to my heart’s content! I’ve loved WordPress for over two decades; that’s not going to change any time soon… and if e.g. ClassicPress ceased tracking WordPress releases and the fork diverged too far for my comfort, I’d probably switch back to regular old WordPress!

× × ×

Reply to Ed Catmull on Change

This is a reply to a post published elsewhere. Its content might be duplicated as a traditional comment at the original source.

Matt Mullenweg said:

[a quote from Ed Catmull’s book Creativity Inc.] made me think a lot about the early days of Gutenberg and the huge resistance it had in the community, including causing the fork of ClassicPress. Now that we’re much further along there’s a pretty widespread acceptance of Gutenberg, and it’s responsible for the vast majority of all WP posts and pages made, however if we had taken a vote for whether it should happen or not, it probably wouldn’t have ever gotten off the ground.

What’s funny is if you go back even further, using a visual WYSIWYG editor in the first place was very controversial, and many people didn’t want the classic editor brought into WordPress.

Long-term WordPresser here; I remember when 2.0 integrated TinyMCE and it was absolutely necessary to ensure that raw HTML editing remained an option, clear and up-front. Which I’m glad of: I probably hit raw HTML about once a month when I’m blogging, to this day!

I was among those who strongly resisted Gutenberg. Nowadays I use it every day! But my primary personal blog, which was already almost six years old when it migrated to WordPress 1.2 back in 2004, still uses the classic editor. I enjoy that I have the freedom to do that.

When we talk about open source meaning freedom, this is the kind of thing we mean. Years ago, I was in charge of the CMS for a major academic institution when the company behind that CMS made a gradual and concerted effort to become less-open-source. That CMS didn’t have the ecosystem and community around it that WordPress has, and so no forks took off, and so my employer got locked-in to upgrading to a new version that was mostly-closed-source and was in some ways inferior. Ugh.

(Incidentally, I got them off that CMS: they’re now using a mixture of WordPress and Drupal for most of their systems. Open source won.)

Change isn’t always good. But open source provides the freedom to embrace change in the way that suits you best.

The Continuum

Last week, I discovered Geneveive Raine‘s “The Continuum”, a super-compressed image comprised of 1-pixel-tall versions of her home page’s daily banners, stitched together1.

I thought it was a beautiful idea, so I stole adapted it to produce an illustration based on the featured images of my blog posts:

Extremely tall diagram consisting of 2,062 horizontal lines in a variety of different colours, each representing a different blog post.
Only about 38% of my 5,445 blog posts have featured images suitable for use in this diagram. But here they are!

I generated a horizontal version too, but I’ve used the vertical version above because it’s more-suitable for use with a HTML imagemap2.

Here’s the code I used to generate the images (and the imagemap), if you want to run it against your own WordPress-ish blog.

Footnotes

1 Which was in-turn inspired by Movie Iris, a tool that visualises the frames of a movie as a radial graphic.

2 What’s a HTML imagemap, you ask? You don’t need to ask: you shouldn’t be using it anyway. Relying on it means you’re setting yourself up for an accessibility nightmare. Anyway: I used one above: you can click on any “stripe” of the image to jump to the corresponding post. It needed some fighting-with because imagemaps can’t work with rescaled images, so I’ve forced the height of the image even as it resizes horizontally. Not that you’re going to click on the stripes anyway: it’s just about the worst way imaginable to navigate a blog.

Blog Questions Challenge

Since Kev Quirk made an adaptation of Ava‘s Blog Questions Challenge I’ve been seeing it everywhere in my blogosphere circle. I’ve gotta be the last person left on Earth to do it, but it has that old-school pass-it-along meme feel, like that 2006 one about describing your friends. I’ve not been tagged by name, but both Jeremy and Garrett did a broad “you” tag, so I’m taking it.

Why did you start blogging in the first place?

It felt like a natural evolution of my second vanity-site. It was 1998, and my site – Castle of the Four Winds – was home to a selection of the same kinds of random crap that everybody put on their homepages at the time. I figured I’d start keeping an online diary: the word “blog” hadn’t been coined yet, and its predecessor “weblog” had only been around for a year and I hadn’t come across it.

So I experimentally started posting a few times a week.

Castle of the Four Winds in early 1999: a very-90s website of white and red text on black, with Times New Roman text, a flaming hit counter, and a blue ribbon campaign button.
I don’t have many of my posts from 1998, but I know from other records that my first deliberately “blog”-like post was on 27 September. But the posts shown in this screenshot, from January 1999, survive and can still be read here1.
By the way, if you liked how my site looked back in the 1990s, you can wind the clock back! Give it a go!

What platform are you using to manage your blog and why did you choose it? Have you blogged on other platforms before?

1998: Static HTML and a bit of Perl

When I started blogging my site was almost entirely plain HTML2. So my original “platform” was probably Emacs.

2000: Static files indexed by PHP

In the Summer of 2000 I registered avangel.com and moved my diary there. I was still storing posts in static files, but used PHP wrappers to share the structure and menus across the pages. It was a massive improvement.

Later, I moved everything to the (ill-advised?) domain name scatmania.org and reimplemented in pretty-much the same way. Until…

2003: Flip

The first real “blogging engine” I used was Flip.

The first version of Scatmania.org: a Flip-powered weblog.
Flip was a bit of a pain to theme, which is why my Flip-powered blog looked quite a bit like most other Flip-powered blogs.

I liked Flip3: it had a raw simplicity that I’d later come to love in young versions of WordPress. And being able to edit from the Web was a huge improvement over having to edit files, especially when I was out and about: I managed to post from my dad’s BlackBerry while cycling across the Outer Hebrides, for example.

2004: WordPress

I’d have outgrown Flip eventually, but I got a nudge in that direction in July 2004. At the time, I was sharing a server with some friends and operated by Gareth, and something went wrong and the server went completely offline. The co-located server disappeared back to Gareth’s house, eventually, and while I’d recovered many of the posts from my own backups, 61 posts remain partially-incomplete to this day (if you happen to have a copy of any of them I’d love to see it!).

I brought my blog back online using WordPress, whose then-new release version 1.2 included an RSS-powered importer: this allowed me to write a little code to convert my entire previous archive into a fat RSS file and then import it wholesale. WordPress was, as remains, pretty magical – a universal blogging platform that evolved into a universal CMS – and I back in the day I occasionally argued online with Matt about technical aspects of the future direction of the project4.

Scatmana.org version 2 - now with actual web design
Those drop-shadows! Those gradients! Those naked hyperlinks differentiated only by being a slightly different colour! That aggressive use of sans-serif fonts with expanded line-heights! Those RSS links, front-and-centre! The only thing that could make this more-obviously “Web 2.0” would be the addition of a wonky “beta” star in the corner.

Incidentally, if you’d like to see more of my blog’s design history over the last 26+ years, I shared a lot of screenshots back in 2018.

If you didn’t know better, you might well not know I’m running WordPress. My theme and custom plugins are… well, they’re an ecosystem all by themselves. And that’s before you even get to things like CapsulePress, my WordPress-to-Gopher/Gemini/Spartan/Nex bridge, the pile of scripts I use to sync-up with the Fediverse, the PWA I use to post notes while I’m on the move, and so on.

2025: ClassicPress

Earlier this year I experimentally switched to ClassicPress; a fork of WordPress. There’ll doubtless be lots more to say about that, down the line5, but here’s the skinny: I don’t use Gutenberg on my blog anyway6, I appreciate having my backend be almost as high-performance as I’ve worked to make my frontend, and I enjoy most of the feature differences7.

How do you write your posts? For example, in a local editing tool, or in a panel/dashboard that’s part of your blog?

With the exception of notes (most of which are written in a tool of my own creation and then pushed to one or both of my Mastodon and my blog simultaneously), I mostly write right into the WordPress/ClassicPress post editor.

I often write ideas, concepts, and first drafts into my Obsidian notebook and then copy/paste out when the time comes.

When do you feel most inspired to write?

There’s no particular pattern, though it feels like I’m most-inspired to write exactly when I should be prioritising something else! That’s why it’s so helpful to be able to write three sentences into Obsidian and then come back to it later!

I’ve been on a bit of a blogging kick these last few years, though. Last year I wrote a massive 436 posts, although that admittedly includes PESOS‘d checkins from geocaching and geohashing expeditions. I’m a fan of Kev’s #100DaysToOffload challenge, and I’m on course to achieve it earlier than ever before, this year (my sixth consecutive year: I do the challenge strictly by calendar years!), as this post is already by 48th… all within the first 38 days of this year8.

Do you publish immediately after writing, or do you let it simmer a bit as a draft?

A mixture of both. Probably most of my posts are written in a single sitting… or, at least, are written in a tab that stays open for the entire time during which it’s written.

But others spend a long time in-progress. You remember how almost a year ago I gave a talk about why Oxford’s area code is 01865? And I promised that there’d be a blog/vlog/maybe-podcast version of that talk later? Yeah: that’s been 90%-there and sitting in a draft pretty-much since then, just waiting for me to make the finishing touches (and record the vlog/podcast variants, if that’s the direction I decide to go in).

And I’ve dusted off drafts that’ve been much older than that, before, too. So it really is a mixture.

What’s your favourite post on your blog?

I couldn’t pick out a favourite that I wouldn’t change my mind about five minutes later. But a recent favourite might have been last Spring’s “Let Your Players Lead The Way”, which aimed to impart some of the things I’ve learned about gamemastering (especially) while being the dungeon master for The Levellers these last few years9.

Not only was it a post that had been a long time coming, and based on months of drafts and re-drafts, but also I really enjoyed writing some post-specific CSS to give it just a slightly more-magical feel.

Screenshot from Let Your Players Lead The Way, showcasing its design in the style of the D&D Players' Handbook.
The downside is that I’ve now got one more thing to try not to break the next time I re-write my blog’s stylesheet.

Any future plans for your blog? Maybe a redesign, a move to another platform, or adding a new feature?

I want to redesign the homepage to be simpler, less-graphical, and more-informational. I’m not sure how that’s going to look, yet.

I’ve been wondering about integrating some of my personal-geotracking into the design (Aaron Parecki does an amazing job of this with his dynamic site background image, for example).

I’m playing with the idea of adding a guestbook, like it’s 1998 again or something.

I’d like to tidy up my tagging taxonomy, and I’m not convinced AI is up to the task.

I need to decide how I feel about the emoji reactions feature I added in 2023. I’m still undecided. What do you think? 👍? 👎?

And as I mentioned: I’m experimenting with ClassicPress. It’s working out mostly-okay so far, but that’s a story for another post.

Next?

I feel like I’m the last person in the universe to do this quiz. But if you haven’t – and you have anything approximating a blog – then you should go next.

Footnotes

1 I wouldn’t recommend actually reading my older posts, though. I was a teenager, and it shows.

2 I had a slightly-fancier kind of hosting, by this point, that gave me a cgi-bin directory into which I could compile binaries (in C) or write scripts (in Perl). My hit counter? That was a Perl script I adapted from Matt Wright’s counter.pl and “enhanced” with some flaming text using Corel Photo-Paint.

3 While writing this post, I hunted down the original developer of Flip. He seems cool.

4 A year later he launched WordPress.com, which then evolved into the foundation of Automattic, and there soon came a point where I thought “I should work there, someday!” It took me a further 14 years before I applied for such a job, though.

5 Right off the bat, though, let me stress that trying ClassicPress is absolutely nothing to do with the drama in the WordPress space right now: in fact I’ve been planning to give it a try ever since the project got its shit together, re-forked WordPress, and released ClassicPress 2.0 a year ago.

6 I don’t have anything against Gutenberg – I use it on other blogs, and every day at work! – and Block Themes are magical… but I’ve never found any benefit to them here: I’ve no need for it, and I’ve got plugins I’ve written for my own use that I’ve never bothered to make Gutenberg-compatible.

7 My biggest gripe with ClassicPress so far is that in removing the jQuery dependency on the post editor’s tag selector they’ve only replaced it with a <datalist>, which is neat and all but kills the ability to autocomplete multiple comma-separated tags at once. But it looks like that’s getting fixed, so I’m going to hang in there for a bit before I decide whether I’m sticking with ClassicPress or not.

8 I’ll save you from doing the maths: if I complete 48 posts in 38 days, I’d expect to complete 100 posts on my 80th day: as it’s not a leap year, that would be Friday 21 March 2025. Let’s see how I get on!

9 Although I’ve been horribly neglecting them for the last couple of months, for various reasons.

× × ×

Can AI retroactively fix WordPress tags?

I’ve a notion that during 2025 I might put some effort into tidying up the tagging taxonomy on my blog. There’s a few tags that are duplicates (e.g. ai and artificial intelligence) or that exhibit significant overlap (e.g. dog and dogs), or that were clearly created when I speculated I’d write more on the topic than I eventually did (e.g. homa night, escalators1, or nintendo) or that are just confusing and weird (e.g. not that bacon sandwich picture).

Cloud-shaped wordcloud of tags used on DanQ.me, sized by frequency. The words "geocaching" and "cache log" dominate the centre of the picture, with terms like "fun", "funny", "geeky", "news", "video games", "technology", "video", and "review" a close second.
Not an entirely surprising word cloud of my tag frequency, given that all of my cache logs are tagged “cache log” and “geocaching”, and relatively-generic terms like “technology”, “fun”, and “funny” appear all over the place on my blog.

Retro-tagging with AI

One part of such an effort might be to go back and retroactively add tags where they ought to be. For about the first decade of my blog, i.e. prior to around 2008, I rarely used tags to categorise posts. And as more tags have been added it’s apparent that many old posts even after that point might be lacking tags that perhaps they ought to have2.

I remain sceptical about many uses of (what we’re today calling) “AI”, but one thing at which LLMs seem to do moderately well is summarisation3. And isn’t tagging and categorisation only a stone’s throw away from summarisation? So maybe, I figured, AI could help me to tidy up my tagging. Here’s what I was thinking:

  1. Tell an LLM what tags I use, along with an explanation of some of the quirkier ones.
  2. Train the LLM with examples of recent posts and lists of the tags that were (correctly, one assumes) applied.
  3. Give it the content of blog posts and ask what tags should be applied to it from that list.
  4. Script the extraction of the content from old posts with few tags and run it through the above, presenting to me a report of what tags are recommended (which could then be coupled with a basic UI that showed me the post and suggested tags, and “approve”/”reject” buttons or similar.

Extracting training data

First, I needed to extract and curate my tag list, for which I used the following SQL4:

SELECT COUNT(wp_term_relationships.object_id) num, wp_terms.slug FROM wp_term_taxonomy
LEFT JOIN wp_terms ON wp_term_taxonomy.term_id = wp_terms.term_id
LEFT JOIN wp_term_relationships ON wp_term_taxonomy.term_taxonomy_id = wp_term_relationships.term_taxonomy_id
WHERE wp_term_taxonomy.taxonomy = 'post_tag'
AND wp_terms.slug NOT IN (
  -- filter out e.g. 'rss-club', 'published-on-gemini', 'dancast' etc.
  -- these are tags that have internal meaning only or are already accurately applied
  'long', 'list', 'of', 'tags', 'the', 'ai', 'should', 'never', 'apply'
)
GROUP BY wp_terms.slug
HAVING num > 2 -- filter down to tags I actually routinely use
ORDER BY wp_terms.slug
Many of my tags are used for internal purposes; e.g. I tag posts published on gemini if they’re to appear on gemini://danq.me/ and dancast if they embed an episode of my podcast. I filtered these out because I never want the AI to suggest applying them.

I took my output and dumped it into a list, and skimmed through to add some clarity to some tags whose purpose might be considered ambiguous, writing my explanation of each in parentheses afterwards. Here’s a part of the list, for example:

Prompt derivation

I used that list as the basis for the system message of my initial prompt:

Suggest topical tags from a predefined list that appropriately apply to the content of a given blog post.

# Steps

1. **Read the Blog Post**: Carefully read through the provided content of the blog post to identify its main themes and topics.
2. **Analyse Key Aspects**: Identify key topics, themes, or subjects discussed in the blog post.
3. **Match with Tags**: Compare these identified topics against the list of available tags.
4. **Select Appropriate Tags**: Choose tags that best represent the main topics and themes of the blog post.

# Output Format

Provide a list of suggested tags. Each tag should be presented as a single string. Multiple tags should be separated by commas.

# Allowed Tags

Tags that can be suggested are as follows. Text in parentheses are not part of the tag but are a description of the kinds of content to which the tag ought to be applied:

- aberdyfi
- aberystwyth
- ...
- youtube
- zoos

# Examples

**Input:**
The rapid advancement of AI technology has had a significant impact on my industry, even on the ways in which I write my blog posts. This post, for example, used AI to help with tagging.

**Output:**
ai, technology, blogging, meta, work

...(other examples)...

# Notes

- Ensure that all suggested tags are relevant to the key themes of the blog post.
- Tags should be selected based on their contextual relevance and not just keyword matching.

This system prompt is somewhat truncated, but you get the idea.

Now I was ready to give it a go with some real data. As an initial simple and short (and therefore also computationally cheap) experiment, I tried feeding it a note I wrote last week about the interrobang’s place in the Spanish language, and in Unicode.

That post already has the following tags (but this wasn’t disclosed to the AI in its training set; it had to work from scratch): , , (a bit of a redundancy there!), , and .

Testing it out

Let’s see what the AI suggests:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_TOKEN" \
  -d '{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "[PROMPT AS DESCRIBED ABOVE]"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "My 8-year-old asked me \"In Spanish, I need to use an upside-down interrobang at the start of the sentence‽\" I assume the answer is yes A little while later, I thought to check whether Unicode defines a codepoint for an inverted interrobang. Yup: ‽ = U+203D, ⸘ = U+2E18. Nice. And yet we dont have codepoints to differentiate between single-bar and double-bar \"cifrão\" dollar signs..."
        }
      ]
    }
  ],
  "response_format": {
    "type": "text"
  },
  "temperature": 1,
  "max_completion_tokens": 2048,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0
}'
Running this via command-line curl meant I quickly ran up against some Bash escaping issues, but set +H and a little massaging of the blog post content seemed to fix it.

GPT-4o-mini

When I ran this query against the gpt-4o-mini model, I got back: unicode, language, education, children, symbols.

That’s… not ideal. I agree with the tags unicode, language, and children, but this isn’t really about education. If I tagged everything vaguely educational on my blog with education, it’d be an even-more-predominant tag than geocaching is! I reserve that tag for things that relate specifically to formal education: but that’s possibly something I could correct for with a parenthetical in my approved tags list.

symbols, though, is way out. Sure, the post could be argued to be something to do with symbols… but symbols isn’t on the approved tag list in the first place! This is a clear hallucination, and that’s pretty suboptimal!

Maybe a beefier model will fare better…

GPT-4o

I switched gpt-4o-mini for gpt-4o in the command above and ran it again. It didn’t take noticeably longer to run, which was pleasing.

The model returned: children, language, unicode, typography. That’s a big improvement. It no longer suggests education, which was off-base, nor symbols, which was a hallucination. But it did suggest typography, which is a… not-unreasonable suggestion.

Neither model suggested spain, and strictly-speaking they were probably right not to. My post isn’t about Spain so much as it’s about Spanish. I don’t have a specific tag for the latter, but I’ve subbed in the former to “connect” the post to ones which are about Spain, but that might not be ideal. Either way: if this is how I’m using the tag then I probably ought to clarify as such in my tag list, or else add a note to the system prompt to explain that I use place names as the tags for posts about the language of those places. (Or else maybe I need to be more-consistent in my tagging).

I experimented with a handful of other well-tagged posts and was moderately-satisfied with the results. Time for a more-challenging trial.

This time, with feeling…

Next, I decided to run the code against a few blog posts that are in need of tags. At this point, I wasn’t quite ready to implement a UI, so I just adapted my little hacky Bash script and copy-pasted HTML-stripped post contents directly into it.

Hand-drawn wireframe application with a blog post shown on the left (with 'previous' and 'next' buttons) and proposed tags on the right (with 'accept' and 'reject' buttons), alongside conventional tag management tools.
If it worked, I decided, I could make a UI. Until then, the command line was plenty sufficient.

I ran against three old posts:

Hospitals (June 2006)

In this post, I shared that my grandmother and my coworker had (independently) been taken into hospital. It had no tags whatsoever.

The AI suggested the tags hospital, family, injury, work, weddings, pub, humour. Which at a glance, is probably a superset of the tags that I’d have considered, but there’s a clear logic to them all.

It clearly picked out weddings based on a throwaway comment I made about a cousin’s wedding, so I disagree with that one: the post isn’t strictly about weddings just because it mentions one.

pub could go either way. It turns out my coworker’s injury occurred at or after a trip to the pub the previous night, and so its relevance is somewhat unknowable from this post in isolation. I think that’s a reasonable suggestion, and a great example of why I’d want any such auto-tagging system to be a human assistant (suggesting candidate tags) and not a fully-automated system. Interesting!

Finally, you might think of humour as being a little bit sarcastic, or maybe overly-laden with schadenfreude. But the blog post explicitly states that my coworker “carefully avoided saying how he’d managed to hurt himself, which implies that it’s something particularly stupid or embarrassing”, before encouraging my friends to speculate on it. However, it turns out that humour isn’t one of my existing tags at all! Boo, hallucinating AI!

I ended up applying all of the AI’s suggestions except weddings and humour. I also applied smartdata, because that’s where I worked (the AI couldn’t have been expected to guess that without context, though!).

Catch-Up: Concerts (June 2005)

This post talked about Ash and I’s travels around the UK to see REM and Green Day in concert5 and to the National Science Museum in London where I discovered that Ash was prejudiced towards… carrot cake.

The AI suggested: concerts, travel, music, preston, london, science museum, blogging.

Those all seemed pretty good at a first glance. Personally, I’d forgotten that we swung by Preston during that particular grand tour until the AI suggested the tag, and then I had to look back at the post more-carefully to double-check! blogging initially seemed like a stretch given that I was only blogging about not having blogged much, but on reflection I think I agree with the robot on this one, because I did explicitly link to a 2002 page that fell off the Internet only a few years ago about the pointlessness of blogging. So I think it counts.

Dan, in his 20s, crouches awkwardly in front of a TV and a Nintendo Wii in a wood-panelled room as he attempts to headbutt a falling blue balloon.
I was able to verify that I’d been in Preston with thanks to this contemporaneous photo. I have no further explanation for the content of the photo, though.

science museum is a big fail though. I don’t use that tag, but I do use the tag museum. So close, but not quite there, AI!

I applied all of its suggestions, after switching museum in place of science museum.

Geeky Winnage With Bluetooth (September 2004)

I wrote this blog post in celebration of having managed to hack together some stuff to help me remote-control my PC from my phone via Bluetooth, which back then used to be a challenge, in the hope that this would streamline pausing, playing, etc. at pizza-distribution-time at Troma Night, a weekly film night I hosted back then.

Four young people, smiling in laughing, sit in a cluttered and messy flat.
If you were sat on that sofa, fighting your way past other people and a mango-chutney-barrel-cum-table to get to a keyboard was genuinely challenging!

It already had the tag technology, which it inherited from a pre-tagging evolution of my blog which used something akin to categories (of which only one could be assigned to a post). In addition to suggesting this, the AI also picked out the following options: bluetooth, geeky, mobile, troma night, dvd, technology, and software.

The big failure here was dvd, which isn’t remotely one of my tags (and probably wouldn’t apply here if it were: this post isn’t about DVDs; it barely even mentions them). Possibly some prompt engineering is required to help ensure that the AI doesn’t make a habit of this “include one tag not from the approved list, every time” trend.

Apart from that it’s a pretty solid list. Annoyingly the AI suggested mobile, which isn’t an approved tag, instead of mobiles, which is. That’s probably a tokenisation fault, but it’s still annoying and a reminder of why even a semi-automated “human-checked” system would need a safety-check to ensure that no absent tags are allowed through to the final stage of approval.

This post!

As a bonus experiment, I tried running my code against a version of this post, but with the information about the AI’s own prompt and the examples removed (to reduce the risk of confusion). It came up with: ai, wordpress, blogging, tags, technology, automation.

All reasonable-sounding choices, and among those I’d made myself… except for tags and automation which, yet again, aren’t among tags that I use. Unless this tendency to hallucinate can be reined-in, I’m guessing that this tool’s going to continue to have some challenges when used on longer posts like this one.

Conclusion and next steps

The bottom line is: yes, this is a job that an AI can assist with, but no, it’s not one that it can do without supervision. The laser-focus with which gpt-4o was able to pick out taggable concepts, faster than I’d have been able to do for the same quantity of text, shows that there’s potential here, but it’s not yet proven itself enough of a time-saver to justify me writing a fluffy UI for it.

However, I might expand on the command-line tools I’ve been using in order to produce a non-interactive list of tagging suggestions, and use that to help inform my work as I tidy up the tags throughout my blog.

You still won’t see any “AI-authored” content on this site (except where it’s for the purpose of talking about AI-generated content, and it’ll always be clearly labelled), and I can’t see that changing any time soon. But I’ll admit that there might be some value in AI-assisted curation and administration, so long as there’s an informed human in the loop at all times.

Footnotes

1 Based on my tagging, I’ve apparently only written about escalators once, while playing Pub Jenga at Robin‘s 21st birthday party. I can’t imagine why I thought it deserved a tag.

2 There are, of course, various other people trying similar approaches to this and similar problems. I might have tried one of them, were it not for the fact that I’m not quite as interested in solving the problem as I am in understanding how one might use an AI to solve the problem. It’s similar to how I don’t enjoy doing puzzles like e.g. sudoku as much as I enjoy writing software that optimises for solving such puzzles. See also, for example, how I beat my children at Mastermind or what the hardest word in Hangman is or my various attempts to avoid doing online jigsaws.

3 Let’s ignore for a moment the farce that was Apple’s attempt to summarise news headlines, shall we?

4 Essentially the same SQL, plus WordClouds.com, was used to produce the word cloud grapic!

5 Two separate concerts, but can you imagine‽ 🤣

× × × ×

Caddy

I’m pretty impressed with running WordPress on Caddy so far.

It took a little jiggerypokery to configure it with an equivalent of the Nginx configuration I use for DanQ.me. But off the back of it I get the capability for HTTP/3, 103 Early Hints, and built-in “batteries included” infrastructure for things like certificate renewal and log rotation.

Browser network debugger showing danq.me being served over protocol 'h3' (HTTP/3) and an 'Early Hints Headers' section loading a WOFF2 font and a JavaScript file.

(why yes, I am celebrating my birthday by doing selfhosting server configuration, why do you ask? 😅)

×

99 Days of Blogging

With this post, for the first time ever1, I’ve blogged for 99 consecutive days!2

Calendar showing Sunday 25 August through Sunday 1 December inclusive, with a coloured "spot" on each day with a size corresponding to the number of posts made on that day, broken into a pie chart showing the proportion of different post kinds.
The dots are sized based on the number of posts and broken-down by post kind: articles are blue, notes are green, checkins are orange, reposts are purple, and replies are red3.
I didn’t set out with the aim of getting to a hundred4, as I might well manage tomorrow, but after a while I began to think it a real possibility. In particular, when a few different factors came together:

Previous long streaks have sometimes been aided by pre-writing posts in bulk and then scheduling them to come out one-a-day6. I mostly don’t do that any more: when a post is “ready”, it gets published.

I didn’t want to make a “this is my 100th day of consecutive blogging” on the 100th day. That attaches too much weight to the nice round number. But I wanted to post to acknowledge that I’m going to make it to 100 days of consecutive blogging… so long as I can think of something worth saying tomorrow. I guess we’ll all have to wait and see.

Footnotes

1 Given that I’ve been blogging for over 26 years, that I’m still finding noteworthy blogging “firsts” is pretty cool, I think

2 My previous record “streak” was only 37 days, so there’s quite a leap there.

3 A massive 219 posts are represented over the last 99 days: that’s an average of over 2 a day!

4 This isn’t an attempt at #100DaysToOffload; I already achieved that this year as it does not require consecutive days. But it’s a cool challenge anyway.

5 My site’s backed by WordPress, but the mobile wp-admin isn’t the best and my site’s so-customised that apps like Jetpack mangle my metadata.

6 As you might now, I consider myself to be the primary audience for my blog: everybody else comes second. That’s why I don’t collect any webstats! When I used to collect webstats, I would sometimes pre-write and “schedule” posts, but without them it just feels pointless to do so!

×