Blog

Dan Q performed maintenance for GC8W7QW Forgotten Bridge

This checkin to GC8W7QW Forgotten Bridge reflects a geocaching.com log entry. See more of Dan's cache logs.

Took a late hike out here for a maintenance checkup before winter: make sure the waterproof seal is good etc. Really creepy to walk out here alone in the night fog, silent except for the occasional startling loud bellow of a rutting muntjack!

All is good here, and I was delighted to find in the logbook perhaps my favourite ever log entry in a geocache I own… it’s from the Oxfordshire County Council Countryside Access Team!

Geocache logbook entry reading: "2/11/21 - Found during bridge inspection. Oxfordshire County Council Countryside Access Team."

×

Dan Q found GC54MFQ (Mac)Donald Where’s Your Troosers?

This checkin to GC54MFQ (Mac)Donald Where's Your Troosers? reflects a geocaching.com log entry. See more of Dan's cache logs.

Staying at the nearby hotel I came out last night to try and find this but quickly gave up rather than poke around in the gloomy night. I’d brought a torch for exactly this kind of purpose but accidentally left it in the car… and the car key in the hotel room!

This morning, though, was much easier. The hint object – which I hadn’t even been able to see last night! – was a great clue and I was about to find a root… I mean route… to it through the undergrowth. TFTC!

Note #19642

Note for Future Dan: if you want Firefox’s picture-in-picture (popout video) mode to be available for videos of less than 45 seconds, the setting you need is media.videocontrols.picture-in-picture.video-toggle.min-video-secs. This is useful if you’ve got a playlist of multiple short clips (which reuse the same <video> element) that you want to treat as a single long video for the purpose of picture-in-picture.

Hot Shots, Part Dream

I’ve a long history of blogging about dreams I’ve had, and though I’ve not done so recently I don’t want you to think it’s because my dreams have gotten any less trippy-as-fuck. Take last night for example…

I plough every penny and spare minute I can into a side-project that in my head at least qualifies as “art”. The result will be fake opening credits animation for the (non-existent) pilot episode of an imagined 80s-style children’s television show. But it gets weirder.

Do you remember Hot Shots!? There’s this scene near the end where Topper Harley, played by Charlie Sheen, returns to the Native American tribe he’s been living with since before the film (in sort of a clash between the “proud warrior race” trope and a parody of Dances With Wolves, which came out the previous year). Returning to his teepee, Topper meets tribal elder Owatonna (Rino Thunder), who asks him about the battle Topper had gone to fight in and, in a callback to an earlier joke, receives the four AA-cell batteries he’d asked Topper to pick up for him “while he was out”.

Still from Hot Shots!. Owatonna, an older Native American man, sits surrounded by animal skins. An English subtitle reads "So, who won?" A Japanese subtitle reads "誰が戦争に勝ったのですか?", which translates as "Who won the war?"
There are very few occasions where a parody film is objectively better than it’s source material, but I maintain that Hot Shots! beats Top Gun hands-down.

I take the dialogue from this scene (which in reality is nonsense, only the subtitles give it any meaning), mangle it slightly, and translate it into Japanese using an automated translation service. I find some Japanese-speaking colleagues to help verify that each line broadly makes sense, at least in isolation.

I commission the soundtrack for my credits sequence. A bit of synth-pop about a minute long. I recruit some voice actors to read each of my Japanese lines, as if they’re characters in an animated kids TV show. I mix it together, putting bits of Japanese dialogue in the right places so that if anybody were to sync-up my soundtrack with the correct scene in Hot Shots!, the Japanese dialogue would closely mirror the conversation that the characters in that film were having. The scene, though, is slow-paced enough that, re-recorded, the voices in my new soundtrack don’t sound like they’re part of the same conversation as one another. This is deliberate.

Meanwhile, I’ve had some artists put together some concept character art for me, based on some descriptions. There’s the usual eclectic mix of characters that you’d expect from 80s cartoons: one character’s a friendly bear-like thing, another’s a cowardly robot, there’s a talking flying unicorn… you know the kind of shit. I give them descriptions, they give me art.

Next, I send the concept art and the soundtrack to an animation team and ask them to produce a credits sequence for it, and I indicate which of the characters depicted should be saying which lines.

Framegrabs from four 80s childrens television programmes showing: marching robots, a cat scratching its ear, a unicorn with a knight's shield behind it, and a pastel-coloured creature using its huge ears to fly.
Identifying the shows I lifted images from to make this sample is left as an exercise for the reader.

Finally, I dump the credits sequence around the Internet, wait a bit, and then start asking on forums “hey, what show is this?” to see what kind of response it gets.

The thing goes viral. It scratches the itch of people who love to try to find the provenance of old TV clips, but of course there’s no payoff because the show doesn’t exist. It doesn’t take too long before somebody translates the dialogue and notices some of the unusual phasing and suggests a connection to Hot Shots! That seems to help date the show as post-1991, but it’s still a mystery. By the time somebody get around to posting a video where the soundtrack overlays the scene from Hot Shots!, conspiracy theories are already all over: the dominant hypothesis is that the clips are from a series of different shows (still to be identified) but only the soundtrack is new… but that still doesn’t answer what the different shows are!

As the phenomenon begins to expand into mainstream media I become aware that even the most meme-averse folks I know are going to hear about it, at some point. And as I ‘m likely to be “found out” as the creator of this weird thing, sooner or later, I decide to come clean about it to people I know sooner, rather than later. I’m hanging out with Ruth and her brothers Robin and Owen and I bring it up:

“Do you remember Hot Shots!? There’s this scene near the end where Topper Harley, played by Charlie Sheen…”, I begin, hoping that the explanation of my process might somehow justify the weird shit I’ve brought to the world. Or at least, that one of this group has already come across this latest Internet trend and will interject and give me an “in”.

Ruth interrupts: “I don’t think I’ve seen Hot Shots!”

“Really?” Realising that this’ll take some background explanation, I begin by referring to Top Gun and the tropes Hot Shots! plays into and work from there.

Some time later, I’m involved with a team who are making a documentary about the whole phenomenon and my part in it. They’re proposing to release a special edition disc with a chapter that uses DVD video’s “multi angle” and “audio format switch” features to allow you to watch your choice of either the scene from Hot Shots! or from my trailer with your choice of either the original audio, my soundtrack, or a commentary by me, but they’re having difficulty negotiating the relevant rights.

After I woke, I tried to tell Ruth about this most-bizarre dream, but soon got stuck in an “am I still dreaming” moment after the following exchange:

“Do you remember Hot Shots!?” I asked.

“I haven’t seen Hot Shots!” she replied.

Maybe I’m still dreaming now.

× ×

Automattic Growth

It just passed two years since I started working at Automattic, and I just made a startling discovery: I’ve now been with the company for longer than 50% of the staff.

When you hear that from a 2-year employee at a tech company, it’s easy to assume that they have a high staff turnover, but Automattic’s churn rate is relatively low, especially for our sector: 86% of developers stay longer than 5 years. So what’s happening? Let’s visualise it:

Graph showing all 1.802 current Automatticians as coloured squares, ordered by their start date. Dan Q's square is squarely (LOL) in the middle.
Everything in this graph, in which each current Automattician is a square, explains how I feel right now: still sometimes like a new fish, but in an increasingly big sea.

All that “red” at the bottom of the graph? That’s recent growth. Automattic’s expanding really rapidly right now, taking on new talent at a never-before-seen speed.

Since before I joined it’s been the case that our goals have demanded an influx of new engineers at a faster rate than we’ve been able to recruit, but it looks like things are improving. Recent refinements to our recruitment process (of which I’ve written about my experience) have helped, but I wonder how much we’ve also been aided by pandemic-related changes to working patterns? Many people, and especially in tech fields, have now discovered that working-from-home works for them, and a company like Automattic that’s been built for the last decade and a half on a “distributed” model is an ideal place to see that approach work at it’s best.

We’re rolling out new induction programmes to support this growth. Because I care about our corporate culture, I’ve volunteered myself as a Culture Buddy, so I’m going to spend some of this winter helping Newmatticians integrate into our (sometimes quirky, often chaotic) ways of working. I’m quite excited to be at a point where I’m in the “older 50%” of the organisation and so have a responsibility for supporting the “younger 50%”, even though I’m surprised that it came around so quickly.

Screenshot from officetoday.wordpress.com, where Automatticians share photos of their current work environment.
Automattic… culture? Can’t we just show them Office Today and be done with it?

I wonder how that graph will look in another two years.

×

Minification vs the GPL

A not-entirely-theoretical question about open source software licensing came up at work the other day. I thought it was interesting enough to warrant a quick dive into the philosophy of minification, and how it relates to copyleft open source licenses. Specifically: does distributing (only) minified source code violate the GPL?

If you’ve come here looking for a legally-justifiable answer to that question, you’re out of luck. But what I can give you is a (fictional) story:

TheseusJS is slow

TheseusJS is a (fictional) Javascript library designed to be run in a browser. It’s released under the GPLv3 license. This license allows you to download and use TheseusJS for any purpose you like, including making money off it, modifying it, or redistributing it to others… but it requires that if you redistribute it you have to do so under the same license and include the source code. As such, it forces you to share with others the same freedoms you enjoy for yourself, which is highly representative of some schools of open-source thinking.

Screenshot showing TheseusJS's GitHub page. The project hasn't been updated in a year, and that was just to add a license: no code has been changed in 12 years.
It’s a cool project, but it really needs some maintenance this side of 2010.

It’s a great library and it’s used on many websites, but its performance isn’t great. It’s become infamous for the impact it has on the speed of the websites it’s used on, and it’s often the butt of jokes by developers: “Man, this website’s slow. Must be running Theseus!”

The original developer has moved onto his new project, Moralia, and seems uninterested in handling the growing number of requests for improvements. So I’ve decided to fork it and make my own version, FastTheseusJS and work on improving its speed.

FastTheseusJS is fast

I do some analysis and discover the single biggest problem with TheseusJS is that the Javascript file itself is enormous. The original developer kept all of the copious documentation in comments in the file itself, and for some reason it doesn’t even compress well. When you use TheseusJS on a website it takes a painfully long time for a browser to download it, if it’s not precached.

Screenshot showing a website for the TheseusJS API. It's pretty labyrinthine (groan).
Nobody even uses the documentation in the comments: there’s a website with a fully-documented API.

My first release of FastTheseusJS, then, removes virtually of the comments, replacing them with a single comment at the top pointing developers to a website where the API is fully documented. While I’m in there anyway, I also fix a minor bug that’s been annoying me for a while.

v1.1.0 changes

  • Forked from TheseusJS v1.0.4
  • Fixed issue #1071 (running mazeSolver() without first connecting <String> component results in endless loop)
  • Removed all comments: improves performance considerably

I discover another interesting fact: the developer of TheseusJS used a really random mixture of tabs and spaces for indentation, sometimes in the same line! It looks… okay if you set your editor up just right, but it’s pretty hideous otherwise. That whitespace is unnecessary anyway: the codebase is sprawling but it seldom goes more than two levels deep, so indentation levels don’t add much readability. For my second release of FastTheseusJS, then, I remove this extraneous whitespace, as well as removing the in-line whitespace inside parameter lists and the components of for loops. Every little helps, right?

v1.1.1 changes

  • Standardised whitespace usage
  • Removed unnecessary whitespace

Some of the simpler functions now fit onto just a single line, and it doesn’t even inconvenience me to see them this way: I know the codebase well enough by now that it’s no disadvantage for me to edit it in this condensed format.

Screenshot of a block of Javascript code intended using semicolons rather than tabs or spaces.
Personally, I’ve given up on the tabs-vs-spaces debate and now I indent my code using semicolons. (That’s clearly a joke. Don’t flame me.)

In the next version, I shorten the names of variables and functions in the code.

For some reason, the original developer used epic rambling strings for function names, like the well-known function dedicateIslandTempleToTheImageOfAGodBeforeOrAfterMakingASacrificeWithOrWithoutDancing( boolBeforeMakingASacrifice, objectImageOfGodToDedicateIslandTempleTo, stringNmeOfPersonMakingDedication, stringOrNullNameOfLocalIslanderDancedWith). That one gets called all the time internally and isn’t exposed via the external API so it might as well be shortened to d=(i,j,k,l,m)=>. Now all the internal workings of the library are each represented with just one or two letters.

v1.1.2 changes

  • Shortened/standarised non-API variable and function names – improves performance

I’ve shaved several kilobytes off the monstrous size of TheseusJS and I’m very proud. The original developer says nice things about my fork on social media, resulting in a torrent of downloads and attention. Within a certain archipelago of developers, I’m slightly famous.

But did I violate the license?

But then a developer says to me: you’re violating the license of the original project because you’re not making the source code available!

A man in a suit sits outdoors with a laptop and a cup of coffee. He is angry and frustrated, and a bubble shows that he is thinking:"why can't people respect the fucking license?!"
This happens every day. Probably not to this same guy every time though, but you never know. Original photo by Andrea Piacquadio.

They claim that my bugfix in the first version of FastTheseusJS represents a material change to the software, and that the changes I’ve made since then are obfuscation: efforts short of binary compilation that aim to reduce the accessibility of the source code. This fails to meet the GPL‘s definition of source code as “the preferred form of the work for making modifications to it”. I counter that this condensed view of the source code is my “preferred” way of working with it, and moreover that my output is not the result of some build step that makes the code harder to read, the code is just hard to read as a result of the optimisations I’ve made. In ambiguous cases, whose “preference” wins?

Did I violate the license? My gut feeling is that no, all of my changes were within the spirit and the letter of the GPL (they’re a terrible way to write code, but that’s not what’s in question here). Because I manually condensed the code, did so with the intention that this condensing was a feature, and continue to work directly with the code after condensing it because I prefer it that way… that feels like it’s “okay”.

But if I’d just run the code through a minification tool, my opinion changes. Suppose I’d run minify --output fasttheseus.js theseus.js and then deleted my copy of theseus.js. Then, making changes to fasttheseus.js and redistributing it feels like a violation to me… even if the resulting code is the same as I’d have gotten via the “manual” method!

I don’t know the answer (IANAL), but I’ll tell you this: I feel hypocritical for saying one piece of code would not violate the license but another identical piece of code would, based only on the process the developer followed to produce it. If I replace one piece of code at a time with less-readable versions the license remains intact, but if I replace them all at once it doesn’t? That doesn’t feel concrete nor satisfying.

Screenshot showing highly-minified HTML code (for this page) which is still reasonably readable.
Sure, I can write a blog post in just one line of code. It’ll just be a really, really, really long line… (Still perfectly readable, though!)

This isn’t an entirely contrived example

This example might seem highly contrived, and that’s because it is. But the grey area between the extremes is where the real questions are. If you agree that redistribution of (only) minified source code violates the GPL, you’re left asking: at what point does the change occur? Code isn’t necessarily minified or not-minified: there are many intermediate steps.

If I use a correcting linter to standardise indentation and whitespace – switching multiple spaces for the appropriate number of tabs, removing excess line breaks etc. (or do the same tasks manually) I’m sure you’d agree that’s fine. If I have it replace whole-function if-blocks with hoisted return statements, that’s probably fine too. If I replace if blocks with ternery operators or remove or shorten comments… that might be fine, but probably depends upon context. At some point though, some way along the process, minification goes “too far” and feels like it’s no longer within the limitations of the license. And I can’t tell you where that point is!

This issue’s even more-complicated with some other licenses, e.g. the AGPL, which extends the requirement to share source code to hosted applications. Suppose I implement a web application that uses an AGPL-licensed library. The person who redistributed it to me only gave me the minified version, but they gave me a web address from which to acquire the full source code, so they’re in the clear. I need to make a small patch to the library to support my service, so I edit it right into the minified version I’ve already got. A user of my hosted application asks for a copy of the source code, so I provide it, including the edited minified library… am I violating the license for not providing the full, unminified version, even though I’ve never even seen it? It seems absurd to say that I would be, but it could still be argued to be the case.

Diagram showing how permissive software licenses are generally compatible for use in LGPL or MPL licensed software, which are then compatible for use (except MPL) in GPL licensed software, which are in turn compatible for use (except GPL 2) with AGPL licensed software.
I love diagrams like this, which show license compatibility of different open source licenses. Adapted from a diagram by Carlo Daffara, in turn adapted from a diagram by David E. Wheeler, used under a CC-BY-SA license.

99% of the time, though, the answer’s clear, and the ambiguities shown above shouldn’t stop anybody from choosing to open-source their work under GPL, AGPL (or any other open source license depending on their preference and their community). Perhaps the question of whether minification violates the letter of a copyleft license is one of those Potter Stewart “I know it when I see it” things. It certainly goes against the spirit of the thing to do so deliberately or unnecessarily, though, and perhaps it’s that softer, more-altruistic goal we should be aiming for.

× × × × × ×

Dan Q performed maintenance for GC9GKJA A Fine Pair # 1625 ~ Eynsham

This checkin to GC9GKJA A Fine Pair # 1625 ~ Eynsham reflects a geocaching.com log entry. See more of Dan's cache logs.

Dropped by for a maintenance visit. All is well except for a discrepancy between the paper and online logs, which I’m following up with the cachers in question.

Update: I’ve verified that everybody who’s previously logged this cache online has found the cache and held the physical logbook, even if they haven’t signed the physical logbook.

From Synergy to Barrier

I’ve been using Synergy for a long, long time. By the time I wrote about my admiration of its notification icon back in 2010 I’d already been using it for some years. But this long love affair ended this week when I made the switch to its competitor, Barrier.

Screenshot showing some pre-1.3 version of Synergy running on Windows Vista.
I’m not certain exactly when I took this screenshot (which I shared with Kit while praising Synergy), but it’s clearly a pre-1.4 version and those look distinctly like Windows Vista’s ugly rounded corners, so I’m thinking no later than 2009?

If you’ve not come across it before: Synergy was possibly the first multiplatform tool to provide seamless “edge-to-edge” sharing of a keyboard and mouse between multiple computers. Right now, for example, I’m sitting in front of Cornet, a Debian 11 desktop, Idiophone, a Macbook Pro docked to a desktop monitor, and Renegade, a Windows desktop. And I can move my mouse cursor from one, to the other, to the next, interacting with them all as if I were connected directly to it.

There have long been similar technologies. KVM switches can do this, as can some modern wireless mice (I own at least two such mice!). But none of them are as seamless as what Synergy does: moving from computer to computer as fast as you can move your mouse and sharing a clipboard between multiple devices. I also love that I can configure my set-up around how I work, e.g. when I undock my Macbook it switches from ethernet to wifi, this gets detected and it’s automatically removed from the cluster. So when I pick up my laptop, it magically stops being controlled by my Windows PC’s mouse and keyboard until I dock it again.

Illustration showing a Debian desktop called Cornet, a Mac laptop with attached monitor called Idiophone, and a Windows desktop called Renegade. All three share a single keyboard and mouse using Barrier.

Synergy’s published under a hybrid model: open-source components, with paid-for extra features. It used to provide more in the open-source offering: you could download a fully-working copy of the software and use it without limitation, losing out only on a handful of features that for many users were unnecessary. Nontheless, early on I wanted to support the development of this tool that I used so much, and so I donated money towards funding its development. In exchange, I gained access to Synergy Premium, and then when their business model changed I got grandfathered-in to a lifetime subscription to Synergy Pro.

I continued using Synergy all the while. When their problem-stricken 2.x branch went into beta, I was among the testers: despite the stability issues and limitations, I loved the fact that I could have what was functionally multiple co-equal “host” computers, and – when it worked – I liked the slick new configuration interface it sported. I’ve been following with bated breath announcements about the next generation – Synergy 3 – and I’ve registered as an alpha tester for when the time comes.

If it sounds like I’m a fanboy… that’d probably be an accurate assessment of the situation. So why, after all these years, have I jumped ship?

Email from Symless to Dan, reading: "Thank you for contacting Synergy Support. My name is Kim and I am happy to assist you. We do not have a download option for the 32 bit version of Debian 10. We currently only have the options available in the members area. Feel free to reach out if you have any further questions or concerns."
Dear Future Dan. If you ever need a practical example of where open-source thinking provides a better user experience than arbritrarily closed-source products, please see above. Yours, Past Dan.

I’ve been aware of Barrier since the project started, as a fork of the last open-source version of the core Synergy program. Initially, I didn’t consider Barrier to be a suitable alternative for me, because it lacked features I cared about that were only available in the premium version of Synergy. As time went on and these features were implemented, I continued to stick with Synergy and didn’t bother to try out Barrier… mostly out of inertia: Synergy worked fine, and the only thing Barrier seemed to offer would be a simpler set-up (because I wouldn’t need to insert my registration details!).

This week, though, as part of a side project, I needed to add an extra computer to my cluster. For reasons that are boring and irrelevant and so I’ll spare you the details, the new computer’s running the 32-bit version of Debian 11.

I went to the Symless download pages and discovered… there isn’t a Debian 11 package. Ah well, I think: the Debian 10 one can probably be made to work. But then I discover… there’s only a 64-bit version of the Debian 10 binary. I’ll note that this isn’t a fundamental limitation – there are 32-bit versions of Synergy available for Windows and for ARMhf Raspberry Pi devices – but a decision by the developers not to support that platform. In order to protect their business model, Synergy is only available as closed-source binaries, and that means that it’s only available for the platforms for which the developers choose to make it available.

So I thought: well, I’ll try Barrier then. Now’s as good a time as any.

Screenshot showing Mac computer "Idiophone" being configured in Barrier to connect to server "Renegade".
Setting up Barrier in place of Synergy was pretty familiar and painless.

Barrier and Synergy aren’t cross-compatible, so first I had to disable Synergy on each machine in my cluster. Then I installed Barrier. Like most popular open-source software, this was trivially easy compared to Synergy: I just used an appropriate package manager by running choco install barrier, brew install barrier, and apt install barrier to install on each of the Windows, Mac, and Debian computers, respectively.

Configuring Barrier was basically identical to configuring Synergy: set up the machine names, nominate one the server, and tell the server what the relative positions are of each of the others’ screens. I usually bind the “scroll lock” key to the “lock my cursor to the current screen” function but I wasn’t permitted to do this in Barrier for some reason, so I remapped my scroll lock key to some random high unicode character and bound that instead.

Getting Barrier to auto-run on MacOS was a little bit of a drag – in the end I had to use Automator to set up a shortcut that ran it and loaded the configuration, and set that to run on login. These little touches are mostly solved in Synergy, but given its technical audience I don’t imagine that anybody is hugely inconvenienced by them. Nonetheless, Synergy clearly retains a slightly more-polished experience.

Altogether, switching from Synergy to Barrier took me under 15 minutes and has so far offered me a functionally-identical experience, except that it works on more devices, can be installed via my favourite package managers, and doesn’t ask me for registration details before it functions. Synergy 3’s going to have to be a big leap forward to beat that!

× × × ×

Note #19590

Called @Tesco Abingdon for a #flujab but fell down a black hole in their menu system. Had to choose the “continue to hold” option several times… and then nobody answered anyway…

Xday Dinner on a Yday???

For most of 2013/2014 and intermittently thereafter my sister ran a weekly-ish “Family Vlog” on YouTube, and I (even more-intermittently) did an ocasional tonge-in-cheek review and analysis of them.

Today, a friend reported that they had eaten “Sunday dinner on a Wednesday”, and I found myself reminded of a running gag in this old, old vlog… and threw together a quick compilation reel of some of its instances.

Note #19581

Today is “superhero day” for nursery/reception, so I continued my effort to straddle the line between being a fun #parent and an embarassing parent line by dropping the kids off like this:

Dan dressed as The Flash

×

Dan Q performed maintenance for GC9GKJA A Fine Pair # 1625 ~ Eynsham

This checkin to GC9GKJA A Fine Pair # 1625 ~ Eynsham reflects a geocaching.com log entry. See more of Dan's cache logs.

I like to check in on my new caches after about a week in the field to ensure there are no teething troubles with their hiding place/weatherproofing etc. All looks good here!

Making an RSS feed of YOURLS shortlinks

As you might know if you were paying close attention in Summer 2019, I run a “URL shortener” for my personal use. You may be familiar with public URL shorteners like TinyURL and Bit.ly: my personal URL shortener is basically the same thing, except that only I am able to make short-links with it. Compared to public ones, this means I’ve got a larger corpus of especially-short (e.g. 2/3 letter) codes available for my personal use. It also means that I’m not dependent on the goodwill of a free siloed service and I can add exactly the features I want to it.

Diagram showing the relationships of the DanQ.me ecosystem. Highlighted is the injection of links into the "S.2" link shortener and the export of these shortened links by RSS into FreshRSS.
Little wonder then that my link shortener sat so close to me on my ecosystem diagram the other year.

For the last nine years my link shortener has been S.2, a tool I threw together in Ruby. It stores URLs in a sequentially-numbered database table and then uses the Base62-encoding of the primary key as the “code” part of the short URL. Aside from the fact that when I create a short link it shows me a QR code to I can easily “push” a page to my phone, it doesn’t really have any “special” features. It replaced S.1, from which it primarily differed by putting the code at the end of the URL rather than as part of the domain name, e.g. s.danq.me/a0 rather than a0.s.danq.me: I made the switch because S.1 made HTTPS a real pain as well as only supporting Base36 (owing to the case-insensitivity of domain names).

But S.2’s gotten a little long in the tooth and as I’ve gotten busier/lazier, I’ve leant into using or adapting open source tools more-often than writing my own from scratch. So this week I switched my URL shortener from S.2 to YOURLS.

Screenshot of YOURLS interface showing Dan Q's list of shortened links. Six are shown of 1,939 total.
YOURLs isn’t the prettiest tool in the world, but then it doesn’t have to be: only I ever see the interface pictured above!

One of the things that attracted to me to YOURLS was that it had a ready-to-go Docker image. I’m not the biggest fan of Docker in general, but I do love the convenience of being able to deploy applications super-quickly to my household NAS. This makes installing and maintaining my personal URL shortener much easier than it used to be (and it was pretty easy before!).

Another thing I liked about YOURLS is that it, like S.2, uses Base62 encoding. This meant that migrating my links from S.2 into YOURLS could be done with a simple cross-database INSERT... SELECT statement:

INSERT INTO yourls.yourls_url(keyword, url, title, `timestamp`, clicks)
  SELECT shortcode, url, title, created_at, 0 FROM danq_short.links

But do you know what’s a bigger deal for my lifestack than my URL shortener? My RSS reader! I’ve written about it a lot, but I use RSS for just about everything and my feed reader is my first, last, and sometimes only point of contact with the Web! I’m so hooked-in to my RSS ecosystem that I’ll use my own middleware to add feeds to sites that don’t have them, or for which I’m not happy with the feed they provide, e.g. stripping sports out of BBC News, subscribing to webcomics that don’t provide such an option (sometimes accidentally hacking into sites on the way), and generating “complete” archives of series’ of posts so I can use my reader to track my progress.

One of S.1/S.2’s features was that it exposed an RSS feed at a secret URL for my reader to ingest. This was great, because it meant I could “push” something to my RSS reader to read or repost to my blog later. YOURLS doesn’t have such a feature, and I couldn’t find anything in the (extensive) list of plugins that would do it for me. I needed to write my own.

Partial list of Dan's RSS feed subscriptions, including Jeremy Keith, Jim Nielson, Natalie Lawhead, Bruce Schneier, Scott O'Hara, "Yahtzee", BBC News, and several podcasts, as well as (highlighted) "Dan's Short Links", which has 5 unread items.
In some ways, subscribing “to yourself” is a strange thing to do. In other ways… shut up, I’ll do what I like.

I could have written a YOURLS plugin. Or I could have written a stack of code in Ruby, PHP, Javascript or some other language to bridge these systems. But as I switched over my shortlink subdomain s.danq.me to its new home at danq.link, another idea came to me. I have direct database access to YOURLS (and the table schema is super simple) and the command-line MariaDB client can output XML… could I simply write an XML Transformation to convert database output directly into a valid RSS feed? Let’s give it a go!

I wrote a script like this and put it in my crontab:

mysql --xml yourls -e                                                                                                                     \
      "SELECT keyword, url, title, DATE_FORMAT(timestamp, '%a, %d %b %Y %T') AS pubdate FROM yourls_url ORDER BY timestamp DESC LIMIT 30" \
    | xsltproc template.xslt -                                                                                                            \
    | xmllint --format -                                                                                                                  \
    > output.rss.xml

The first part of that command connects to the yourls database, sets the output format to XML, and executes an SQL statement to extract the most-recent 30 shortlinks. The DATE_FORMAT function is used to mould the datetime into something approximating the RFC-822 standard for datetimes as required by RSS. The output produced looks something like this:

<?xml version="1.0"?>
<resultset statement="SELECT keyword, url, title, timestamp FROM yourls_url ORDER BY timestamp DESC LIMIT 30" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <row>
        <field name="keyword">VV</field>
        <field name="url">https://webdevbev.co.uk/blog/06-2021/perfect-is-the-enemy-of-good.html</field>
        <field name="title"> Perfect is the enemy of good || Web Dev Bev</field>
        <field name="timestamp">2021-09-26 17:38:32</field>
  </row>
  <row>
        <field name="keyword">VU</field>
        <field name="url">https://webdevlaw.uk/2021/01/30/why-generation-x-will-save-the-web/</field>
        <field name="title">Why Generation X will save the web  Hi, Im Heather Burns</field>
        <field name="timestamp">2021-09-26 17:38:26</field>
  </row>

  <!-- ... etc. ... -->
  
</resultset>

We don’t see this, though. It’s piped directly into the second part of the command, which  uses xsltproc to apply an XSLT to it. I was concerned that my XSLT experience would be super rusty as I haven’t actually written any since working for my former employer SmartData back in around 2005! Back then, my coworker Alex and I spent many hours doing XML backflips to implement a system that converted complex data outputs into PDF files via an XSL-FO intermediary.

I needn’t have worried, though. Firstly: it turns out I remember a lot more than I thought from that project a decade and a half ago! But secondly, this conversion from MySQL/MariaDB XML output to RSS turned out to be pretty painless. Here’s the template.xslt I ended up making:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="resultset">
    <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
      <channel>
        <title>Dan's Short Links</title>
        <description>Links shortened by Dan using danq.link</description>
        <link> [ MY RSS FEED URL ] </link>
        <atom:link href=" [ MY RSS FEED URL ] " rel="self" type="application/rss+xml" />
        <lastBuildDate><xsl:value-of select="row/field[@name='pubdate']" /> UTC</lastBuildDate>
        <pubDate><xsl:value-of select="row/field[@name='pubdate']" /> UTC</pubDate>
        <ttl>1800</ttl>
        <xsl:for-each select="row">
          <item>
            <title><xsl:value-of select="field[@name='title']" /></title>
            <link><xsl:value-of select="field[@name='url']" /></link>
            <guid>https://danq.link/<xsl:value-of select="field[@name='keyword']" /></guid>
            <pubDate><xsl:value-of select="field[@name='pubdate']" /> UTC</pubDate>
          </item>
        </xsl:for-each>
      </channel>
    </rss>
  </xsl:template>
</xsl:stylesheet>

That uses the first (i.e. most-recent) shortlink’s timestamp as the feed’s pubDate, which makes sense: unless you’re going back and modifying links there’s no more-recent changes than the creation date of the most-recent shortlink. Then it loops through the returned rows and creates an <item> for each; simple!

The final step in my command runs the output through xmllint to prettify it. That’s not strictly necessary, but it was useful while debugging and as the whole command takes milliseconds to run once every quarter hour or so I’m not concerned about the overhead. Using these native binaries (plus a little configuration), chained together with pipes, had already resulted in way faster performance (with less code) than if I’d implemented something using a scripting language, and the result is a reasonably elegant “scratch your own itch”-type solution to the only outstanding barrier that was keeping me on S.2.

All that remained for me to do was set up a symlink so that the resulting output.rss.xml was accessible, over the web, to my RSS reader. I hope that next time I’m tempted to write a script to solve a problem like this I’ll remember that sometimes a chain of piped *nix utilities can provide me a slicker, cleaner, and faster solution.

Update: Right as I finished writing this blog post I discovered that somebody had already solved this problem using PHP code added to YOURLS; it’s just not packaged as a plugin so I didn’t see it earlier! Whether or not I use this alternate approach or stick to what I’ve got, the process of implementing this YOURLS-database ➡ XML ➡  XSLTRSS chain was fun and informative.

× × ×