technology – Page 3

Why is there a “small house” in IBM’s Code page 437?

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

There’s a small house ( ⌂ ) in the middle of IBM’s infamous character set Code Page 437. “Small house”—that’s the official IBM name given to the glyph at code position 0x7F, where a control character for “Delete” (DEL) should logically exist. It’s cute, but a little strange. I wonder, how did it get there? Why did IBM represent DEL as a house, of all things?

…

Glyph Drawing Club Blog

It probably ought to be no surprise that I, somebody who’s written about the beauty and elegance of the ASCII table, would love this deep dive into the specifics of the unusual graphical representation of the DEL character in IBM Code Page 437.

It’s highly accessible, so even if you’ve only got a passing interest in, I don’t know, text encoding or typography or the history of computing, it’s a great read.

WordPress to ClassicPress

As I mentioned in my recent Blog Questions Challenge, I recently switched my blog from WordPress, which it had been running on for over 20 years of its 26 year history, to ClassicPress.¹ I’m aware that I’m not the only person for whom ClassicPress might be a better fit than WordPress², so I figured I should share the process by which I undertook the change.

Switching from WordPress to ClassicPress

Switching from WordPress to ClassicPress should be a non-destructive, 100% reversible process, but (even though I’ve got solid backups) I wasn’t ready to trust that, so I decided to operate on a copy of my site. I’m glad I did, because there were a couple of teething issues I needed to tackle before I could launch.

1. Duplicating the site

I took a simple approach to duplicating the site: (1) I copied the site directory, and (2) I copied the database, and (3) I set up a new subdomain to use for testing. Here’s how I did each step:

1.1. Copying the site directory

This should’ve been simple, but a du -sh revealed that my /wp-content/uploads directory is massive (I should look into that) and I didn’t want to clone it. And I didn’t want r need to clone my /wp-content/cache directory either. So I ran:

rsync -av --exclude=wp-content ./old-site-directory/ ./new-site-directory/ to copy everything except wp-content, and then
rsync -av --exclude=uploads --exclude=cache ./old-site-directory/wp-content/ ./new-site-directory/wp-content/ to copy wp-content except the uploads and cache subdirectories, and then finally
ln -s ./old-site-directory/wp-content/uploads ./new-site-directory/wp-content/uploads to symlink the uploads directory, sharing it between the two sites

1.2. Copying the database

I just piped mysqldump into mysql to clone from one database to the other:

mysqldump -uUSERNAME -p --lock-tables=false old-site-database | mysql -uUSERNAME -p new-site-database

I edited DB_NAME in wp-config.php in the new site’s directory to point it at the new database.

1.3. Setting up a new subdomain

My DNS is already configured with a wildcard to point (almost) all *.danq.me subdomains to this server already. I decided to use the name classicpress-testing.danq.me as my temporary/test domain name. To keep any “changes” to my cloned site to a minimum, I overrode the domain name in my wp-config.php rather than in my database, by adding the following lines:

define('WP_HOME','https://classicpress-testing.danq.me');
define('WP_SITEURL','https://classicpress-testing.danq.me');

Because I use Caddy/FrankenPHP as my webserver³, configuration was really easy: I just copied the relevant part of my Caddyfile (actually an include), changed the domain name and the root, and it just worked, even provisioning me out a LetsEncrypt SSL certificate. Magical⁴.

2. Switching the duplicate to ClassicPress

Now that I had a duplicate copy of my blog running at https://classicpress-testing.danq.me/, it was time to switch it to ClassicPress. I started by switching my wp-admin colour scheme to a different one in my cloned site, so it’d be immediately visually-obvious to me if I’d accidentally switched and was editing the “wrong” site (I also made sure I was logged-out of my primary, live site, so I was confident I wouldn’t break anything while I was experimenting!).

ClassicPress provides a migration plugin which checks for common problems and then switches your site from WordPress to ClassicPress, so I installed it and ran it. It said that everything was okay except for my (custom) theme and a my self-built plugins, which it understandably couldn’t check compatibility of. It recommended that I install Twenty Seventeen – the last WordPress default theme to not require the block editor – but I didn’t do so: I was confident that my theme would work anyway… and if it didn’t, I’d want to fix it rather than switch theme!

And then… it all broke.

3. Fixing what broke

After swiftly doing a safety-check that my live site was still intact, I started trying to work out why my site wasn’t broken. Debugging a ClassicPress PHP issue is functionally identical to debugging a similar WordPress issue, for obvious reasons: check the logs, work out what’s broken, realise it’s a plugin, disable that plugin while you investigate further, etc.

In my case, the “blocking” issues were:

Jetpack: this plugin does not play nicely with ClassicPress, presumably because it fails if it’s unable to register a block. Fortunately, I wasn’t actually using Jetpack for anything other than for VaultPress (which has saved my butt on at least one occasion and whose praises I sing), so I uninstalled Jetpack and installed the standalone plugin version of VaultPress instead, which worked fine.
EWWW Image Optimizer: I use this plugin to pregenerate WebP variants of my images, which I then serve using webserver rules. It’s not a complex job, and I should probably integrate the feature into my theme at some point, but for now I use this plugin. Version 8.0.0 of the plugin doesn’t work on ClassicPress 2.3.1, so I used WP-CLI to downgrade to the last version that does (7.7.0), and then it worked fine.
Dan’s Geocaching Log Reposter: a self-made plugin that copies my logs from geocaching websites stopped working properly, which I think is because ClassicPress is doing a more-aggressive job than WordPress at nonce validation on admin REST endpoints? I put a quick hack into my plugin to work around it, but I’ll need to look into this properly at some point.
Some other bits of my stack, e.g. CapsulePress (my Gemini/Spartan/Nex server), have their own copies of my database credentials, because I’ve been too lazy to centralise them into environment variables, and needed updating (but not until live switchover time).

Everything else worked fine, as far as I’ve determined in the weeks that have followed. My other plugins, which include Advanced Editor Tools (I should probably look into Enriched Editor), Antispam Bee, Google Authenticator, IndieAuth, Post Kinds, Post Snippets, Regenerate Thumbnails, Syndication Links, Webmention, WebSub, and WP-SCSS all “just worked”.

4. Completing the switchover

I ran the two sites in-parallel for a couple of weeks, with the ClassicPress one as a “read only” version (so I didn’t pollute my uploads directory!), but it was pretty unnecessary because it all worked pretty seamlessly, despite my complex stack of custom code. When I wanted to switch for-real, all I needed to do was swap the domain names over in my Caddyfile and edit the wp-config.php of my ClassicPress installation: step 1.3, but in reverse!

If you hadn’t been told⁵, you probably wouldn’t have even known I’d made a change: I suppress basically all infrastructure-identifying headers from my server output as a matter of course, and ClassicPress and WordPress are functionally-interchangeable from a front-end perspective⁶.

So what’s difference?

From my experience, here are the differences I’ve discovered since switching from WordPress to ClassicPress:

The good stuff

😅 ClassicPress has no Gutenberg/block editor. This would absolutely be a showstopper for many people, and that’s fine: I have nothing against the block editor (I use it basically every day elsewhere!), but I’ve never really used it on danq.me and don’t feel the need to change that! My theme, my workflow, and my custom plugins are all geared around the perfectly-good “classic” editor, and so getting a more-lightweight CMS by removing a feature I wasn’t using anyway falls somewhere between neutral and a blessing.
⚡The backend is fast again! One of the changes the ClassicPress team have been working on applying to WordPress is to strip out jQuery and other redundancies from the backend, and I love how much faster and lighter my editor interface is as a result. (With caveat; see below!)
🔌Virtually everything “just works”. With the few exceptions described above, everything works exactly as it does under WordPress. Which is what you’d hope for a fork that’s mostly “WordPress, but without the block editor”, right, but it’s still reassuring (and, for me, an essential feature). There are a few “new” features to do with paging through posts and the media library and they’re fine, I suppose, but not by themselves worth switching for (though it might be nice to backport them into WordPress!).

The bad stuff

🏷️ Adding tags to posts takes a step backwards. A side-effect of dropping jQuery is the partial loss of the autocomplete feature when selecting tags to add to a post. You still get a partial autocomplete, but not after typing a comma: you need to press enter to submit the tag you were writing and then start typing them next, which frankly sucks. This is because they’re relying on a <datalist>, which isn’t as full-featured as the Javascript solution WordPress employs. This bugs me almost enough to be a showstopper, but I gather it’s getting fixed in a near-future version.
🗺️ You’re in uncharted territory when things go wrong. One great benefit of WordPress is the side-effects of its ubiquity. If you have a query or a problem you can throw a stone at your favourite search engine and get a million answers… and some of them will even be right! If you have a problem in ClassicPress and it’s not shared with (or you’re not sure if it’s shared with) WordPress… you’re mostly on your own. The forums are good and friendly, but if you want a quick answer to something, you’re likely to have to roll your sleeves up and open some source code. I don’t mind this at all – when I first started using WordPress, this was the case, too! – but it might be a showstopper for some folks.

In summary: I’m enjoying using ClassicPress, even where there are rough edges. For me, 99% of my experience with it is identical to how I used WordPress anyway, it’s relatively lightweight and fast, and it’s easy enough to switch back if I change my mind.

Footnotes

¹ It saddens me that I have to keep clarifying this, but I feel like I do: my switch from WordPress to ClassicPress is absolutely nothing to do with any drama in the WordPress space that’s going on right now: in fact, I’d been planning to try it out since before any of the drama appeared. I appreciate that some people making a similar switch, including folks who use this blog post as a guide, might have different motivations to me, and that’s fine too. Personally, I think that ditching an installation of open-source WordPress based on your interpretation of what’s going on in the ecosystem is… short-sighted? But hey: the joy of open source is you can – and should! – do what you want. Anyway: the short of it is – the desire to change from WordPress to ClassicPress was, for me, 100% a technical decision and 0% a political one. And I’ll thank you for leaving any of your drama at the door if you slide into my comments, ta!

² Matt recently described ClassicPress as “the last decent fork attempt for WordPress”, and I absolutely agree. There’s been a spate of forks and reimplementations recently. I’ve looked into many of them and been… very much underwhelmed. Want my hot take? Sure, here you go: AspirePress is all lofty ideas and no deliverables. FreeWP seems to be the same, but somehow without the lofty ideas. ForkPress is a ghost. Speaking of ghosts, Ghost isn’t a WordPress fork; they have got some cool ideas though. b2evolution is even less a WordPress fork but it’s pretty cool in its own right. I’m not sure what clamPress is trying to achieve but I’ve not given it a serious look. So yeah: ClassicPress is, in my mind, the only WordPress fork even worth consideration at this point, and as I describe in this blog post: it’s not for everybody.

³ I switched from Nginx over the winter and it’s been just magical: I really love Caddy’s minimal approach to production configuration. The only thing I’ve been able to fault it on is that it’s not capable of setting up client-side SSL certificate authentication on a path, only on an entire domain, which meant I needed to reimplement the authentication mechanism I use on a small part of my (non-blog) internal infrastructure.

⁴ To be fair, it wouldn’t have been hard if I’d still be using Nginx, because I’d set up Certbot to use DNS-based vertification to issue me wildcard SSL certificates. But doing this in Caddy still felt magical.

⁵ And assuming you don’t religiously check my colophon page.

⁶ Indeed, I wouldn’t have considered a switch to ClassicPress in the first place if it wasn’t a closely-aligned-enough fork that I retained the ability to flip-flop between the two to my heart’s content! I’ve loved WordPress for over two decades; that’s not going to change any time soon… and if e.g. ClassicPress ceased tracking WordPress releases and the fork diverged too far for my comfort, I’d probably switch back to regular old WordPress!

Reading more rolled papyri

In kind-of local news, I see that the folks at Diamond Light Source (which I got to visit last year) have been helping the Bodleian (who I used to work for) to X-ray one of their Herculaneum scrolls (which I’ve written about before). That’s really cool.

Cafe Proximity Principles

Possible future presentation concept: using a cafe/dining metaphor to help explain the proximity principle in user interface design (possibly with a “live waitstaffing” demo?).

Great idea? Or stupid idea?

Yr Wyddfa’s First Email

On Wednesday, Vodafone announced that they’d made the first ever satellite video call from a stock mobile phone in an area with no terrestrial signal. They used a mountain in Wales for their experiment.

It reminded me of an experiment of my own, way back in around 1999, which I probably should have made a bigger deal of. I believe that I was the first person to ever send an email from the top of Yr Wyddfa/Snowdon.

Nowadays, that’s an easy thing to do. You pull your phone out and send it. But back then, I needed to use a Psion 5mx palmtop, communicating over an infared link using a custom driver (if you ever wondered why I know my AT-commands by heart… well, this isn’t exactly why, but it’s a better story than the truth) to a Nokia 7110 (fortunately it was cloudy enough to not interfere with the 9,600 baud IrDA connection while I positioned the devices atop the trig point), which engaged a GSM 2G connection, over which I was able to send an email to myself, cc:’d to a few friends.

It’s not an exciting story. It’s not even much of a claim to fame. But there you have it: I was (probably) the first person to send an email from the summit of Yr Wyddfa. (If you beat me to it, let me know!)

Note #25565

Last year, a colleague introduced me to lazygit, a TUI git client with a wealth of value-added features.

Somehow, though, my favourite feature is the animation you see if you nuke the working tree. 😘 Excellent.

What if Emails were Multilingual?

Multilingual emails

Back when I was a student in Aberystwyth, I used to receive a lot of bilingual emails from the University and its departments¹. I was reminded of this when I received an email this week from CACert, delivered in both English and German.

Wouldn’t it be great if there were some kind of standard for multilingual emails? Your email client or device would maintain an “order of preference” of the languages that you speak, and you’d automatically be shown the content in those languages, starting with the one you’re most-fluent in and working down.

The Web’s already got this functionality², and people have been sending multilingual emails for much longer than they’ve been developing multilingual websites³!

Enter RFC8255!

It turns out that this is a (theoretically) solved problem. RFC8255 defines a mechanism for breaking an email into multiple different languages in a way that a machine can understand and that ought to be backwards-compatible (so people whose email software doesn’t support it yet can still “get by”). Here’s how it works:

You add a Content-Type: multipart/multilingual header with a defined boundary marker, just like you would for any other email with multiple “parts” (e.g. with a HTML and a plain text version, or with text content and an attachment).
The first section is just a text/plain (or similar) part, containing e.g. some text to explain that this is a multilingual email, and if you’re seeing this then your email client probably doesn’t support them, but you should just be able to scroll down (or else look at the attachments) to find content in the language you read.
Subsequent sections have:
- Content-Disposition: inline, so that for most people using non-compliant email software they can just scroll down until they find a language they can read,
- Content-Type: message/rfc822, so that an entire message can be embedded (which allows other headers, like the Subject:, to be translated too),
- a Content-Language: header, specifying the ISO code of the language represented in that section, and
- optionally, a Content-Translation-Type: header, specifying either original (this is the original text), human (this was translated by a human), or automated (this was the result of machine translation) – this could be used to let a user say e.g. that they’d prefer a human translation to an automated one, given the choice between two second languages.

Let’s see a sample email:

Content-Type: multipart/multilingual;
 boundary=10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
To: <b24571@danq.me>
From: <rfc8255test-noreply@danq.link>
Subject: Does your email client support RFC8255?
Mime-Version: 1.0
Date: Fri, 27 Sep 2024 10:06:56 +0000

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=utf-8

This is a multipart message in multiple languages. Each part says the
same thing but in a different language. If your email client supports
RFC8255, you will see this message in your preferred language out of
those available. Otherwise, you will probably see each language after
one another or else each language in a separate attachment.

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
Content-Disposition: inline
Content-Type: message/rfc822
Content-Language: en
Content-Translation-Type: original

Subject: Does your email client support RFC8255?
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0

RFC8255 is a standard for sending email in multiple languages. This
is the original email in English. It is embedded alongside the same
content in a number of other languages.

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664
Content-Disposition: inline
Content-Type: message/rfc822
Content-Language: fr
Content-Translation-Type: automated

Subject: Votre client de messagerie prend-il en charge la norme RFC8255?
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0

RFC8255 est une norme permettant d'envoyer des courriers
électroniques dans plusieurs langues. Le présent est le courriel
traduit en français. Il est intégré à côté du même contenu contenu
dans un certain nombre d'autres langues.

--10867f6c7dbe49b2cfc5bf880d888ce1c1f898730130e7968995bea413a65664--

Why not copy-paste this into a raw email and see how your favourite email client handles it! That’ll be fun, right?

Can I use it?

That proposed standard turns seven years old next month. Sooo… can we start using it?⁴

Turns out… not so much. I discovered that NeoMutt supports it:

NeoMutt’s implementation is basic, but it works: you can specify a preference order for languages and it respects it, and if you don’t then it shows all of the languages as a series of attachments. It can apparently even be used to author compliant multilingual emails, although I didn’t get around to trying that.

Support in other clients is… variable.

A reasonable number of them don’t understand the multilingual directives but still show the email in a way that doesn’t suck:

Screenshot from Thunderbird, showing each language one after the other. — Mozilla Thunderbird does a respectable job of showing each language’s subject and content, one after another.

Some shoot for the stars but blow up on the launch pad:

Screenshot from GMail, showing each language one after the other, but with a stack of extra headers and an offer to translate it to English for me (even though the English is already there). — GMail displays all the content, but it pretends that the alternate versions are *forwarded* messages and adds a stack of meaningless blank headers to each. And then offers to translate the result for you, even though the content is already right there in English.

Others still seem to be actively trying to make life harder for you:

ProtonMail’s Web interface shows only the fallback content, putting the remainder into .eml attachments… which is then won’t display, forcing you to download them and find some other email client to look at them in!⁵

And still others just shit the bed at the idea that you might read an email like this one:

That’s just the clients I’ve tested, but I can’t imagine that others are much different. If you give it a go yourself with something I’ve not tried, then let me know!

I guess this means that standardised multilingual emails might be forever resigned to the “nice to have but it never took off so we went in a different direction” corner of the Internet, along with the <keygen> HTML element and the concept of privacy.

Footnotes

¹ I didn’t receive quite as much bilingual email as you might expect, given that the University committed to delivering most of its correspondence in both English and Welsh. But I received a lot more than I do nowadays, for example

² Although you might not guess it, given how many websites completely ignore your Accept-Language header, even where it’s provided, and simply try to “guess” what language you want using IP geolocation or something, and then require that you find whatever shitty bit of UI they’ve hidden their language selector behind if you want to change it, storing the result in a cookie so it inevitably gets lost and has to be set again the next time you visit.

³ I suppose that if you were sending HTML emails then you might use the lang="..." attribute to mark up different parts of the message as being in different languages. But that doesn’t solve all of the problems, and introduces a couple of fresh ones.

⁴ If it were a cool new CSS feature, you can guarantee that it’d be supported by every major browser (except probably Safari) by now. But email doesn’t get so much love as the Web, sadly.

⁵ Worse yet, if you’re using ProtonMail with a third-party client, ProtonMail screws up RFC8255 emails so badly that they don’t even work properly in e.g. NeoMutt any more! ProtonMail swaps the multipart/multilingual content type for multipart/mixed and strips the Content-Language: headers, making the entire email objectively less-useful.

Calculating the Ideal “Sex and the City” Polycule

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

I’ve never been even remotely into Sex and the City. But I can’t help but love that this developer was so invested in the characters and their relationships that when he asked himself “couldn’t all this drama and heartache have been simplified if these characters were willing to consider polyamorous relationships rather than serial monogamy?”¹, he did the maths to optimise his hypothetical fanfic polycule:

Juan Pablo Sarmiento

As if his talk at !!Con 2024 wasn’t cool enough, he open-sourced the whole thing, so you’re free to try the calculator online for yourself or expand upon or adapt it to your heart’s content. Perhaps you disagree with his assessment of the relative relationship characteristics of the characters²: tweak them and see what the result is!

Or maybe Sex and the City isn’t your thing at all? Well adapt it for whatever your fandom is! How I Met Your Mother, Dawson’s Creek, Mamma Mia and The L-Word were all crying out for polyamory to come and “fix” them³.

Perhaps if you’re feeling especially brave you’ll put yourself and your circles of friends, lovers, metamours, or whatever into the algorithm and see who it matches up. You never know, maybe there’s a love connection you’ve missed! (Just be ready for the possibility that it’ll tell you that you’re doing your love life “wrong”!)

Footnotes

¹ This is a question I routinely find myself asking of every TV show that presents a love triangle as a fait accompli resulting from an even moderately-complex who’s-attracted-to-whom.

² Clearly somebody does, based on his commit “against his will” that increases Carrie and Big’s validatesOthers scores and reduces Big’s prioritizesKindness.

³ I was especially disappointed with the otherwise-excellent The L-Word, which did have a go at an ethical non-monogamy storyline but bungled the “ethical” at every hurdle while simultaneously reinforcing the “insatiable bisexual” stereotype. Boo! Anyway: maybe on my next re-watch I’ll feed some numbers into Juan’s algorithm and see what comes out…

AI posts on social media are the chicken nuggets of human interaction

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Perhaps inspired by my resharing of Thomas‘s thoughts about the biggest problem in AI (tl;dr: he thinks it’s nomenclature; I agree that’s a problem but I don’t know if it’s the biggest issue), Ruth posted some thoughts to LinkedIn that I think are quite well-put:

I was going to write about something else but since LinkedIn suggested I should get AI to do it for me, here’s where I currently stand on GenAI.

As a person working in computing, I view it as a tool that is being treated as a silver bullet and is probably self-limiting in its current form. By design, it produces average code. Most companies prior to having access to cheap average code would have said they wanted good code. Since the average code produced by the tools is being fed back into those tools, mathematically this can’t lead anywhere good in terms of quality.

However, as a manager in tech I’m really alarmed by it. If we have tools to write code that is ok but needs a lot of double checking, we might be tempted to stop hiring people at that level. There already aren’t enough jobs for entry level programmers to feed the talent pipeline, and this is likely to make it worse. I’m not sure where the next generation of great programmers are supposed to come from if we move to an ecosystem where the junior roles are replaced by Copilot.

I think there’s a lot of potential for targeted tools to speed up productivity. I just don’t think GenAI is where they should come from.

Ruth Trevor-Allen

This is an excellent explanation of no fewer than four of the big problems with “AI” as we’re seeing it marketed today:

It produces mediocre output, (more on that below!)
It’s a snake that eats its own tail,
It’s treated as a silver bullet, and
By pricing out certain types of low-tier knowledge work, it damages the pipeline for training higher-tiers of those knowledge workers (e.g. if we outsource all first-level tech support to chatbots, where will the next generation of third-level tech support come from, if they can’t work their way up the ranks, learning as they go?)

Let’s stop and take a deeper look at the “mediocre output” claim. Ruth’s right, but if you don’t already understand why generative AI does this, it’s worth a little bit of consideration about the reason for it… and the consequences of it:

Mathematically-speaking, that’s exactly what you would expect for something that is literally statistically averaging content, but that still comes as a surprise to people.

Bear in mind, of course, that there are plenty of topics in which the average person is less-knowledgable than the average of the content that was made available to the model. For example, I know next to noting about fertiliser application in large-scale agriculture. ChatGPT has doubtless ingested a lot of literature about it, and if I ask it what fertiliser I should use for a field of black beans in silty soil in the UK, it delivers me a confident-sounding answer:

When LLMs produce exceptional output (I use the term exceptional in the sense of unusual and not-average, not to mean “good”), it appears more-creative and interesting but is even more-likely to be riddled with fanciful hallucinations.

There’s a fine line in getting the creativity dial set just right, and even when you do there’s no guarantee of accuracy, but the way in which many chatbots are told to talk makes them sound authoritative on basically every subject. When you know it’s lying, that’s easy. But people don’t always use LLMs for subjects they’re knowledgeable about!

ChatGPT defined several words - 1. Quantifiable: Something that can be measured or expressed in numerical terms. 2. No cap: A slang term meaning "no lie" or "I'm being truthful." 3. Erinaceiophobia: An irrational fear of hedgehogs. 4. Undercontrastised: A medical term referring to an image, usually from a scan, that lacks sufficient contrast for clear diagnosis. (I made this word up, but ChatGPT defined it anyway!). 5. Ethology: The scientific study of animal behavior, particularly in natural environments. — I asked ChatGPT to define five words for me. Two (“quantifiable” and “ethology”) are real words that somebody might have trouble with. One (“no cap”) is a slang term. One (“erinaceiophobia” is a logically-sound construction from the Latin name for the biological family that hedgehogs belong to and the Greek suffix that’s applied to irrational fears). ChatGPT came up with perfectly reasonable definitions of all of these. But it also confidently defined “undercontrastised”, a word I made up and which I can’t find used anywhere at all!

In my example above, a more-useful robot would have stated that it didn’t know the answer to the question rather than, y’know, lying. But the nature of the statistical models used by LLMs means that they can’t know what they don’t know: they don’t have a “known unknowns” space.

Regarding the “damages the training pipeline”: I’m undecided on whether or not I agree with Ruth. She might be on to something there, but I’m not sure. Needs more thought before I commit to an opinion on that one.

Ruth followed-up to say:

Oh, and an addendum to this – as a human, I find the proliferation of AI tools in spaces that are all about creating connections with other humans deeply concerning. I saw a lot of job applications through Otta at my previous role, and they were all kind of the same – I had no sense of the person behind the averaged out CV I was looking at. We already have a huge problem with people presenting inauthentic versions of themselves on social media which makes it harder to have genuine interactions, smoothing off the rough edges of real people to get something glossy and processed is only going to make this worse.

AI posts on social media are the chicken nuggets of human interaction and I’d rather have something real every time.

Ruth Trevor-Allen

Emphasis mine… because that’s a fantastic metaphor. Content generated where a generative AI is trying to “look human” are so-often bland, flat, and unexciting: a mass-produced most-basic form of social sustenance. So yeah: chicken nuggets.

Photo of chicken nuggets with "AI" written on each of them. — Ironically, I might’ve gotten a better picture here if I’d asked AI to draw this for me, because I couldn’t find any really unappetising-looking McDonalds-grade chicken nuggets on the stock photography site I used.

Roll Your Own Antispam

Three Rings operates a Web contact form to help people get in touch with us: the idea is that it provides a quick and easy way to reach out if you’re a charity who might be able to make use of the system, a user who’s having difficulty with the features of the software, or maybe a potential new volunteer willing to give your time to the project.

But then the volume of spam it received increased dramatically. We don’t want our support team volunteers to spend all their time categorising spam: even if it doesn’t take long, it’s demoralising. So what could we do?

Clearly-spammy message shown in a ticket management system. — It’s clearly spam, but if it takes you 2 seconds to categorise it and there are 30 in your Inbox, that’s still a drag.

Our conventional antispam tools are configured pretty liberally: we don’t want to reject a contact from a legitimate user just because their message hits lots of scammy keywords (e.g. if a user’s having difficulty logging in and has copy-pasted all of the error messages they received, that can look a lot like a password reset spoofing scam to a spam filter). And we don’t want to add a CAPTCHA, because not only do those create a barrier to humans – while not necessarily reducing spam very much, nowadays – they’re often terrible for accessibility, privacy, or both.

But it didn’t take much analysis to spot some patterns unique to our contact form and the questions it asks that might provide an opportunity. For example, we discovered that spam messages would more-often-than-average:

Fill in both the “name” and (optional) “Three Rings username” field with the same value. While it’s cetainly possible for Three Rings users to have a login username that’s identical to their name, it’s very rare. But automated form-fillers seem to disproportionately pair-up these two fields.
Fill the phone number field with a known-fake phone number or a non-internationalised phone number from a country in which we currently support no charities. Legitimate non-UK contacts tend to put international-format phone numbers into this optional field, if they fill it at all. Spammers often put NANP (North American Numbering Plan) numbers.
Include many links in the body of the message. A few links, especially if they’re to our services (e.g. when people are asking for help) is not-uncommon in legitimate messages. Many links, few of which point to our servers, almost certainly means spam.
Choose the first option for the choose -one question “how can we help you?” Of course real humans sometimes pick this option too, but spammers almost always choose it.

None of these characteristics alone, or any of the half dozen or so others we analysed (including invisible checks like honeypots and IP-based geofencing), are reason to suspect a message of being spam. But taken together, they’re almost a sure thing.

To begin with, we assigned scores to each characteristic and automated the tagging of messages in our ticketing system with these scores. At this point, we didn’t do anything to block such messages: we were just collecting data. Over time, this allowed us to find a safe “threshold” score above which a message was certainly spam.

Once we’d found our threshold we were able to engage a soft-block of submissions that exceeded it, and immediately the volume of spam making it to the ticketing system dropped considerably. Under 70 lines of PHP code (which sadly I can’t share with you) and we reduced our spam rate by over 80% while having, as far as we can see, no impact on the false-positive rate.

Where conventional antispam solutions weren’t quite cutting it, implementing a few rules specific to our particular use-case made all the difference. Sometimes you’ve just got to roll your sleeves up and look at the actual data you do/don’t want, and adapt your filters accordingly.

So… I’m A Podcast

Duration 10:25

Podcast Version

This post is also available as a podcast. Listen here, download for later, or subscribe wherever you consume podcasts.

Observant readers might have noticed that some of my recent blog posts – like the one about special roads, my idea for pressure-cooking tea, and the one looking at the history of window tax in two countries¹ – are also available as podcast.

Why?

Like my occasional video content, this isn’t designed to replace any of my blogging: it’s just a different medium for those that might prefer it.

For some stories, I guess that audio might be a better way to find out what I’ve been thinking about. Just like how the vlog version of my post about my favourite video game Easter Egg might be preferable because video as a medium is better suited to demonstrating a computer game, perhaps audio’s the right medium for some of the things I write about, too?

But as much as not, it’s just a continuation of my efforts to explore different media over which a WordPress blog can be delivered². Also, y’know, my ongoing effort to do what I’m bad at in the hope that I might get better at a wider diversity of skills.

How?

Let’s start by understanding what a “podcast” actually is. It is, in essence, just an RSS feed (something you might have heard me talk about before…) with audio enclosures – basically, “attachments” – on each item. The idea was spearheaded by Dave Winer back in 2001 as a way of subscribing to rich media like audio or videos in such a way that slow Internet connections could pre-download content so you didn’t have to wait for it to buffer.³

Here’s what I had to do to add podcasting capability to my theme:

The tag

I use a post tag, dancast, to represent posts with accompanying podcast content⁴. This way, I can add all the podcast-specific metadata only if the user requests the feed of that tag, and leave my regular feeds untampered . This means that you don’t get the podcast enclosures in the regular subscription; that might not be what everybody would want, but it suits me to serve podcasts only to people who explicitly ask for them.

It also means that I’m able to use a template, tag-dancast.php, in my theme to generate a customised page for listing podcast episodes.

The feed

Okay, onto the code (which I’ve open-sourced over here). I’ve use a series of standard WordPress hooks to add the functionality I need. The important bits are:

rss2_item – to add the <enclosure>, <itunes:duration>, <itunes:image>, and <itunes:explicit> elements to the feed, when requesting a feed with my nominated tag. Only <enclosure> is strictly required, but appeasing Apple Podcasts is worthwhile too. These are lifted directly from the post metadata.
the_excerpt_rss – I have another piece of post metadata in which I can add a description of the podcast (in practice, a list of chapter times); this hook swaps out the existing excerpt for my custom one in podcast feeds.
rss_enclosure – some podcast syndication platforms and players can’t cope with RSS feeds in which an item has multiple enclosures, so as a safety precaution I strip out any enclosures that WordPress has already added (e.g. the featured image).
the_content_feed – my RSS feed usually contains the full text of every post, because I don’t like feeds that try to force you to go to the original web page⁵ and I don’t want to impose that on others. But for the podcast feed, the text content of the post is somewhat redundant so I drop it.
rss2_ns – of critical importance of course is adding the relevant namespaces to your XML declaration. I use the itunes namespace, which provides the widest compatibility for specifying metadata, but I also use the newer podcast namespace, which has growing compatibility and provides some modern features, most of which I don’t use except specifying a license. There’s no harm in supporting both.
rss2_head – here’s where I put in the metadata for the podcast as a whole: license, category, type, and so on. Some of these fields are effectively essential for best support.

You’re welcome, of course, to lift any of all of the code for your own purposes. WordPress makes a perfectly reasonable platform for podcasting-alongside-blogging, in my experience.

What?

Finally, there’s the question of what to podcast about.

My intention is to use podcasting as an alternative medium to my traditional blog posts. But not every blog post is suitable for conversion into a podcast! Ones that rely on images (like my post about dithering) aren’t a great choice. Ones that have lots of code that you might like to copy-and-paste are especially unsuitable.

Dan, a microphone in front of him, smiles at the camera. — You’re listening to Radio Dan. 100% Dan, 100% of the time.(Also I suppose you might be able to hear my dog snoring in the background…)

Also: sometimes I just can’t be bothered. It’s already some level of effort to write a blog post; it’s like an extra 25% effort on top of that to record, edit, and upload a podcast version of it.

That’s not nothing, so I’ve tended to reserve podcasts for blog posts that I think have a sort-of eccentric “general interest” vibe to them. When I learn something new and feel the need to write a thousand words about it… that’s the kind of content that makes it into a podcast episode.

Which is why I’ve been calling the endeavour “a podcast nobody asked for, about things only Dan Q cares about”. I’m capable of getting nerdsniped easily and can quickly find my way down a rabbit hole of learning. My podcast is, I guess, just a way of sharing my passion for trivial deep dives with the rest of the world.

My episodes are probably shorter than most podcasts: my longest so far is around fifteen minutes, but my shortest is only two and a half minutes and most are about seven. They’re meant to be a bite-size alternative to reading a post for people who prefer to put things in their ears than into their eyes.

Anyway: if you’re not listening already, you can subscribe from here or in your favourite podcasting app. Or you can just follow my blog as normal and look for a streamable copy of podcasts at the top of selected posts (like this one!).

Footnotes

¹ I’ve also retroactively recorded a few older ones. Have a look/listen!

² As well as Web-based non-textual content like audio (podcasts) and video (vlogs), my blog is wholly or partially available over a variety of more-exotic protocols: did you find me yet on Gemini (gemini://danq.me/), Spartan (spartan://danq.me/), Gopher (gopher://danq.me/), and even Finger (finger://danq.me/, or run e.g. finger blog@danq.me from your command line)? Most of these are powered by my very own tool CapsulePress, and I’m itching to try a few more… how about a WordPress blog that’s accessible over FTP, NNTP, or DNS? I’m not even kidding when I say I’ve got ideas for these…

³ Nowadays, we have specialised media decoder co-processors which reduce the size of media files. But more-importantly, today’s high-speed always-on Internet connections mean that you probably rarely need to make a conscious choice between streaming or downloading.

⁴ I actually intended to change the tag to podcast when I went-live, but then I forgot, and now I can’t be bothered to change it. It’s only for my convenience, after all!

⁵ I’m very grateful that my favourite feed reader makes it possible to, for example, use a CSS selector to specify the page content it should pre-download for you! It means I get to spend more time in my feed reader.

Even More 1999!

Spencer’s filter

Last month I implemented an alternative mode to view this website “like it’s 1999”, complete with with cursor trails, 88×31 buttons, tables for layout¹, tiled backgrounds, and even a (fake) hit counter.

One thing I’d have liked to do for 1999 Mode but didn’t get around to would have been to make the images look like it was the 90s, too.

Back then, many Web users only had graphics hardware capable of displaying 256 distinct colours. Across different platforms and operating systems, they weren’t even necessarily the same 256 colours²! But the early Web agreed on a 216-colour palette that all those 8-bit systems could at least approximate pretty well.

I had an idea that I could make my images look “216-colour”-ish by using CSS to apply an SVG filter, but didn’t implement it.

But Spencer, a long-running source of excellent blog comments, stepped up and wrote an SVG filter for me! I’ve tweaked 1999 Mode already to use it… and I’ve just got to say it’s excellent: huge thanks, Spencer!

The filter coerces colours to their nearest colour in the “Web safe” palette, resulting in things like this:

Plenty of pictures genuinely looked like that on the Web of the 1990s, especially if you happened to be using a computer only capable of 8-bit colour to view a page built by somebody who hadn’t realised that not everybody would experience 24-bit colour like they did³.

Dithering

But not all images in the “Web safe” palette looked like this, because savvy web developers knew to dither their images when converting them to a limited palette. Let’s have another go:

Dithering introduces random noise to media⁴ in order to reduce the likelihood that a “block” will all be rounded to the same value. Instead; in our picture, a block of what would otherwise be the same colour ends up being rounded to maybe half a dozen different colours, clustered together such that the ratio in a given part of the picture is, on average, a better approximation of the correct colour.

The result is analogous to how halftone printing – the aesthetic of old comics and newspapers, with different-sized dots made from few colours of ink – produces the illusion of a continuous gradient of colour so long as you look at it from far-enough away.

The other year I read a spectacular article by Surma that explained in a very-approachable way how and why different dithering algorithms produce the results they do. If you’ve any interest whatsoever in a deep dive or just want to know what blue noise is and why you should care, I’d highly recommend it.

You used to see digital dithering everywhere, but nowadays it’s so rare that it leaps out as a revolutionary aesthetic when, for example, it gets used in a video game.

All of which is to say that: I really appreciate Spencer’s work to make my “1999 Mode” impose a 216-colour palette on images. But while it’s closer to the truth, it still doesn’t quite reflect what my website would’ve looked like in the 1990s because I made extensive use of dithering when I saved my images in Web safe palettes⁵.

Why did I take the time to dither my images, back in the day? Because doing the hard work once, as a creator of graphical Web pages, saves time and computation (and can look better!), compared to making every single Web visitor’s browser do it every single time.

Which, now I think about it, is a lesson that’s still true today (I’m talking to you, developers who send a tonne of JavaScript and ask my browser to generate the HTML for you rather than just sending me the HTML in the first place!).

Footnotes

¹ Actually, my “1999 mode” doesn’t use tables for layout; it pretty much only applies a CSS overlay, but it’s deliberately designed to look a lot like my blog did in 1999, which did use tables for layout. For those too young to remember: back before CSS gave us the ability to lay out content in diverse ways, it was commonplace to use a table – often with the borders and cell-padding reduced to zero – to achieve things that today would be simple, like putting a menu down the edge of a page or an image alongside some text content. Using tables for non-tabular data causes problems, though: not only is it hard to make a usable responsive website with them, it also reduces the control you have over the order of the content, which upsets some kinds of accessibility technologies. Oh, and it’s semantically-invalid, of course, to describe something as a table if it’s not.

² Perhaps as few as 22 colours were defined the same across all widespread colour-capable Web systems. At first that sounds bad. Then you remember that 4-bit (16 colour) palettes used to look look perfectly fine in 90s videogames. But then you realise that the specific 22 “very safe” colours are pretty shit and useless for rendering anything that isn’t composed of black, white, bright red, and maybe one of a few greeny-yellows. Ugh. For your amusement, here’s a copy of the image rendered using only the “very safe” 22 colours.

³ Spencer’s SVG filter does pretty-much the same thing as a computer might if asked to render a 24-bit colour image using only 8-bit colour. Simply “rounding” each pixel’s colour to the nearest available colour is a fast operation, even on older hardware and with larger images.

⁴ Note that I didn’t say “images”: dithering is also used to produce the same “more natural” feel for audio, too, when reducing its bitrate (i.e. reducing the number of finite states into which the waveform can be quantised for digitisation), for example.

⁵ I’m aware that my footnotes are capable of nerdsniping Spencer, so by writing this there’s a risk that he’ll, y’know, find a way to express a dithering algorithm as an SVG filter too. Which I suspect isn’t possible, but who knows! 😅

Ladybird Browser

I’ve been playing with the (pre-Alpha version of) Ladybird, and it fills me with such joy and excitement.

Browser diversity

Back in 2018, while other Web developers were celebrating, I expressed my dismay at the news that Microsoft Edge was on the cusp of switching from using Microsoft’s own browser engine EdgeHTML to using Blink. Blink is the engine that powers almost all other mainstream browsers; all but Firefox, which continues to stand atop Gecko.

The developers who celebrated this loss of rendering engine diversity were, I suppose, happy to have one fewer browser in which they must necessarily test their work. I guess these are the same developers who don’t test the sites they develop for accessibility (does your site work if you can’t see the images? what about with a keyboard but without a pointing device? how about if you’re colourblind?), or consider what might happen if a part of their site fails (what if the third-party CDN that hosts your JavaScript libraries goes down or is blocked by the user’s security software or their ISP?).

But I was sad, because – as I observed after Andre Alves Garzia succinctly spelled it out – browser engines are an endangered species. Building a new browser that supports the myriad complexities of the modern Web is such a huge endeavour that it’s unlikely to occur from scratch: from this point on, all “new” browsers are likely to be based upon an existing browser engine.

Engine diversity is important. Last time we had a lull in engine diversity, the Web got stuck, stagnating in the shadow of Internet Explorer 6’s dominance and under the thumb of Microsoft’s interests. I don’t want those days to come back; that’s a big part of why Firefox is my primary web browser.

A Ladybird book browser

Spoof cover for "The Ladybird Book of The Browser" — I actually still own a copy of the book from which I adapted this cover!

Ladybird is a genuine new browser engine. Y’know, that thing I said that we might never see happen again! So how’ve they made it happen?

It helps that it’s not quite starting from scratch. It’s starting point is the HTML viewer component from SerenityOS. And… it’s pretty good. It’s DOM processing’s solid, it seems to support enough JavaScript and CSS that the modern Web is usable, even if it’s not beautiful 100% of the time.

They’re not even expecting to make an Alpha release until next year! Right now if you want to use it at all, you’re going to need to compile the code for yourself and fight with a plethora of bugs, but it works and that, all by itself, is really exciting.

They’ve got four full-time engineers, funded off donations, with three more expected to join, plus a stack of volunteer contributors on Github. I’ve raised my first issue against the repo; sadly my C++ probably isn’t strong enough to be able to help more-directly, even if I somehow did have enough free time, which I don’t. But I’ll be watching-from-afar this wonderful, ambitious, and ideologically-sound initiative.

#100DaysToOffload

Woop! This is my 100th post of the year (stats), even using my more-conservative/pedant-friendly “don’t count checkins/reposts/etc. rule. If you’re not a pedant, I achieved #100DaysToOffload when I found a geocache alongside Regents Canal while changing trains to go to Amsterdam where I played games with my new work team, looked at windows and learned about how they’ve been taxed, and got nerdsniped by a bus depot. In any case: whether you’re a pedant or not you’ve got to agree I’ve achieved it now!

ARCC

In the late ’70s, a shadowy group of British technologists concluded that nuclear war was inevitable and secretly started work on a cutting-edge system designed to help rebuild society. And thanks to Matt Round-and-friends at vole.wtf (who I might have mentioned before), the system they created – ARCC – can now be emulated in your browser.

3D rendering of an ARCC system, by HappyToast.

I’ve been playing with it on-and-off all year, and I’ve (finally) managed to finish exploring pretty-much everything the platform currently has to offer, which makes it pretty damn good value for money for the £6.52 I paid for my ticket (the price started at £2.56 and increases by 2p for every ticket sold). But you can get it cheaper than I did if you score 25+ on one of the emulated games.

Most of what I just told you is true. Everything… except the premise. There never was a secretive cabal of engineers who made this whackballs computer system. What vole.wtf emulates is an imaginary system, and playing with that system is like stepping into a bizarre alternate timeline or a weird world. Over several separate days of visits you’ll explore more and more of a beautifully-realised fiction that draws from retrocomputing, Cold War fearmongering, early multi-user networks with dumb terminal interfaces, and aesthetics that straddle the tripoint between VHS, Teletext, and BBS systems. Oh yeah, and it’s also a lot like being in a cult.

Needless to say, therefore, it presses all the right buttons for me.

If you enjoy any of those things, maybe you’d like this too. I can’t begin to explain the amount of work that’s gone into it. If you’re looking for anything more-specific in a recommendation, suffice to say: this is a piece of art worth seeing.

A completely plaintext WordPress Theme

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

This is a silly idea. But it works. I saw Dan Q wondering about plaintext WordPress themes – so I made one.

This is what this blog looks like using it:

…

Terence Eden

I clearly nerdsniped Terence at least a little when I asked whether a blog necessarily had to be HTML, because he went on to implement a WordPress theme that delivers content entirely in plain text.

Naturally, I’ve also shared his accomplishment on my own text/plain blog (which uses a much simpler CMS based on static files).