Nowadays if you’re on a railway station and hear an announcement, it’s usually a computer stitching together samples1. But back in the day, there used to be a human
with a Tannoy microphone sitting in the back office, telling you about the platform alternations and
destinations.
I had a friend who did it as a summer job, once. For years afterwards, he had a party trick that I always quite enjoyed: you’d say the name of a terminus station on a direct line from
Preston, e.g. Edinburgh Waverley, and he’d respond in his announcer-voice: “calling at Lancaster, Oxenholme the Lake District, Penrith, Carlisle, Lockerbie, Haymarket, and Edinburgh
Waverley”, listing all of the stops on that route. It was a quirky, beautiful, and unusual talent. Amazingly, when he came to re-apply for his job the next summer he didn’t get it,
which I always thought was a shame because he clearly deserved it: he could do the job blindfold!
There was a strange transitional period during which we had machines to do these announcements, but they weren’t that bright. Years later I found myself on Haymarket station waiting for
the next train after mine had been cancelled, when a robot voice came on to announce a platform alteration: the train to Glasgow would now be departing from platform 2, rather than
platform 1. A crowd of people stood up and shuffled their way over the footbridge to the opposite side of the tracks. A minute or so later, a human announcer apologised for the
inconvenience but explained that the train would be leaving from platform 1, and to disregard the previous announcement. Between then and the train’s arrival the computer tried twice
more to send everybody to the wrong platform, leading to a back-and-forth argument between the machine and the human somewhat reminiscient of the white zone/red zone scene from Airplane! It was funny perhaps only
because I wasn’t among the people whose train was in superposition.
Clearly even by then we’d reached the point where the machine was well-established and it was easier to openly argue with it than to dig out the manual and work out how to turn it off.
Nowadays it’s probably even moreso, but hopefully they’re less error-prone.
When people talk about how technological unemployment, they focus on the big changes, like how a tipping point with self-driving vehicles might one day revolutionise the haulage
industry… along with the social upheaval that comes along with forcing a career change on millions of drivers.
But in the real world, automation and technological change comes in salami slices. Horses and carts were seen alongside the automobile for decades. And you still find stations with
human announcers. Even the most radically-disruptive developments don’t revolutionise the world overnight. Change is inevitable, but with preparation, we can be ready for it.
A few yeras ago, I wanted to subscribe to The Far Side‘s “Daily Dose” via my RSS reader. The Far Side doesn’t have an RSS feed, so I implemented a proxy/middleware to bridge the two.
If you’re looking for a more-general instruction on using XPath scraping in FreshRSS, this isn’t it.
The release of version 1.20.0 of my favourite RSS reader FreshRSS provided a new mechanism for subscribing to content from sites that didn’t provide feeds: XPath scraping. I demonstrated the use of this to subscribe to my friend Beverley‘s blog, but this week I figured it was time to have a go at retiring my middleware and subscribing directly to The Far Side from FreshRSS.
It turns out that FreshRSS’s XPath Scraping is almost enough to achieve exactly what I want. The big problem is that the image server on The Far Side website tries to
prevent hotlinking by checking the Referer: header on requests, so we need a proxy to spoof that. I threw together a quick PHP program to act as a proxy (if
you don’t have this, you’ll have to click-through to read each comic), then configured my FreshRSS feed as follows:
Feed URL:https://www.thefarside.com/
The “Daily Dose” gets published to The Far Side‘s homepage each day.
XPath for finding new items://div[@class="card tfs-comic js-comic"]
Finds each comic on the page. This is probably a little over-specific and brittle; I should probably switch to using the contains function at some point. I
subsequently have to use parent:: and ancestor:: selectors which is usually a sign that your screen-scraping is suboptimal, but in this case it’s necessary
because it’s only at this deep level that we start seeing really specific classes.
Item title:concat("Far Side #", parent::div/@data-id)
The comics don’t have titles (“The one with the cow”?), but these seem to have unique IDs in the data-id attribute of the parent <div>, so I’m using
those as a reference.
Item content:descendant::div[@class="card-body"]
Within each item, the <div class="card-body"> contains the comic and its text. The comic itself can’t be loaded this way for two reasons: (1) the <img
src="..."> just points to a placeholder (the site uses JavaScript-powered lazy-loading, ugh – the actual source is in the data-src attribute), and (2) as
mentioned above, there’s anti-hotlink protection we need to work around.
Item link:descendant::input[@data-copy-item]/@value
Each comic does have a unique link which you can access by clicking the “share” button under it. This makes a hidden text <input> appear, which we can
identify by the presence of the data-copy-item attribute. The contents of this textbox is the sharing URL for
the comic.
Item thumbnail:concat("https://example.com/referer-faker.php?pw=YOUR-SECRET-PASSWORD-GOES-HERE&referer=https://www.thefarside.com/&url=",
descendant::div[@class="tfs-comic__image"]/img/@data-src)
Here’s where I hook into my special proxy server, which spoofs the Referer: header to work around the anti-hotlinking code. If you wanted you might be able to come up
with an alternative solution using a custom JavaScript loaded into your FreshRSS instance (there’s a plugin for that!), perhaps to load an iframe of the sharing URL? Or you can
host a copy of my proxy server yourself (you can’t use mine, it’s got a password and that password isn’tYOUR-SECRET-PASSWORD-GOES-HERE!)
Item date:ancestor::div[@class="tfs-page__full tfs-page__full--md"]/descendant::h3
There’s nothing associating each comic with the date it appeared in the Daily Dose, so we have to ascend up to the top level of the page to find the date from the heading.
Item unique ID:parent::div/@data-id
Giving FreshRSS a unique ID can help it stop showing duplicates. We use the unique ID we discovered earlier; this way, if the Daily Dose does a re-run of something it already did
since I subscribed, I won’t be shown it again. Omit this if you want to see reruns.
Hurrah; once again I can laugh at repeats of Gary Larson’s best work alongside my other morning feeds.
There’s a moral to this story: when you make your website deliberately hard to consume, fewer people will access it in the way you want!The Far Side‘s website
is actively hostile to users (JavaScript lazy-loading, anti-right click scripts, hotlink protection, incorrect MIME types, no feeds etc.), and an inevitable consequence of that is that people like me will find and share workarounds to that
hostility.
If you’re ad-supported or collect webstats and want to keep traffic “on your site” on this side of 2004, you should make it as easy as possible for people to subscribe to content.
Consider The Oatmeal or Oglaf, for example, which offer RSS feeds that include only a partial thumbnail of each comic and a link through to the full thing. I don’t feel the need to screen-scrape those sites
because they’ve given me a subscription option that works, and I routinely click-through to both of them to enjoy their latest content!
Conversely, the Far Side‘s aggressive anti-subscription technology ultimately means that there are fewer actual visitors to their website… because folks like me work
to circumvent them.
And now you know how I did so.
Update: want the new content that’s being published to The Far Side in FreshRSS, too? I’ve got a recipe for that!
103: Early Hints (“I’m not sure this can last forever.”)
300: Multiple Choices (“There are so many ways I can do better than you.”)
303: See Other (“You should date other people.”)
304: Not Modified (“With you, I feel like I’m stagnating.”)
402: Payment Required (“I am a prostitute.”)
403: Forbidden (“You don’t get this any more.”)
406: Not Acceptable (“I could never introduce you to my parents.”)
408: Request Timeout (“You keep saying you’ll propose but you never do.”)
409: Conflict (“We hate each other.”)
410: Gone (ghosted)
411: Length Required (“Your penis is too small.”)
413: Payload Too Large (“Your penis is too big.”)
416: Range Not Satisfied (“Our sex life is boring and repretitive.”)
425: Too Early (“Your premature ejaculation is a problem.”)
428: Precondition Failed (“You’re still sleeping with your ex-!?”)
429: Too Many Requests (“You’re so demanding!”)
451: Unavailable for Legal Reasons (“I’m married to somebody else.”)
502: Bad Gateway (“Your pussy is awful.”)
508: Loop Detected (“We just keep fighting.”)
With thanks to Ruth for the conversation that inspired these pictures, and apologies to the rest of the Internet for creating them.
I’m off work sick today: it’s just a cold, but it’s had a damn good go at wrecking my lungs and I feel pretty lousy. You know how when you’ve got too much of a brain-fog to trust
yourself with production systems but you still want to write code (or is that just me?), so this morning I threw together a really, really stupid project which you can play online here.
It’s a board game. Well, the digital edition of one. Also, it’s not very good.
It’s inspired by a toot by Mason”Tailsteak” Williams (whom I’ve mentioned before once or
twice). At first I thought I’d try to calculate
the odds of winning at his proposed game, or how many times one might expect to play before winning, but I haven’t the brainpower for that in my snot-addled brain. So instead I threw
together a terrible, terrible digital implementation.
Go play it if, like me, you’ve got nothing smarter that your brain can be doing today.
Just in time for Robin Sloan to give up on Spring ’83, earlier this month I finally got aroud to launching STS-6 (named for the first mission of the Space Shuttle Challenger in Spring 1983), my experimental Spring ’83 server. It’s
been a busy year; I had other things to do. But you might have guessed that something like this had been under my belt when I open-sourced a keygenerator for the protocol the other day.
If you’ve not played with Spring ’83, this post isn’t going to make much sense to you. Sorry.
My server is, as far as I can tell, very different from any others in a few key ways:
It does not allow third-party publishing at all. Some might argue that this undermines the aim of the exercise, but I disagree. My IndieWeb inclinations lead me to
favour “self-hosted” content, shared from its owners’ domain. Also: the specification clearly states that a server must implement a denylist… I guess my denylist simply includes all keys that are
not specifically permitted.
It’s geared towards dynamic content.My primary board self-publishes whenever I produce a new blog post, listing the most recent blog posts published. I have
another half-implemented which shows a summary of the most-recent post, and another which would would simply use a WordPress page as its basis – yes, this was content
management, but published over Spring ’83.
It provides helpers to streamline content production. It supports internal references to other boards you control using the format {{board:123}}which are
automatically converted to addresses referencing the public key of the “current” keypair for that board. This separates the concept of a board and its content template from that
board’s keypairs, making it easier to link to a board. To put it another way, STS-6 links are self-healing on the server-side (for local boards).
It helps automate content-fitting. Spring ’83 strictly requires a maximum board size of 2,217 bytes. STS-6 can be configured to fit a flexible amount of dynamic
content within a template area while respecting that limit. For my posts list board, the number of posts shown is moderated by the size of the resulting board: STS-6 adds more and
more links to the board until it’s too big, and then removes one!
It provides “hands-off” key management features. You can pregenerate a list of keys with different validity periods and the server will automatically cycle through
them as necessary, implementing and retroactively-modifying <link rel="next"> connections to keep them current.
I’m sure that there are those who would see this as automating something that was beautiful because it was handcrafted; I don’t know whether or not I agree, but had Spring ’83
taken off in a bigger way, it would always only have been a matter of time before somebody tried my approach.
From a design perspective, I enjoyed optimising an SVG image of my header so it could meaningfully fit into the board. It’s
pretty, and it’s tolerably lightweight.
If you want to see my server in action, patch this into your favourite Spring ’83 client:
https://s83.danq.dev/10c3ff2e8336307b0ac7673b34737b242b80e8aa63ce4ccba182469ea83e0623
A dead end?
Without Robin’s active participation, I feel that Spring ’83 is probably coming to a dead end. It’s been a lot of fun to play with and I’d love to see what ideas the experience of it
goes on to inspire next, but in its current form it’s one of those things that’s an interesting toy, but not something that’ll make serious waves.
In his last lab essay Robin already identified many of the key issues with the system (too complicated, no interpersonal-mentions, the challenge of keys-as-identifiers, etc.) and while
they’re all solvable without breaking the underlying mechanisms (mentions might be handled by Webmention, perhaps, etc.), I
understand the urge to take what was learned from this experiment and use it to help inform the decisions of the next one. Just as John Postel’s Quote of the Day protocol doesn’t see much use any more (although maybe if my
finger server could support QotD?) but went on to inspire the direction of many subsequent “call-and-response” protocols,
including HTTP, it’s okay if Spring ’83 disappears into obscurity, so long as we can learn what it did
well and build upon that.
Meanwhile: if you’re looking for a hot new “like the web but lighter” protocol, you should probably check out Gemini. (Incidentally, you
can find me at gemini://danq.me, but that’s something I’ll write about another day…)
On Wednesday this week, three years and two months after Oxford Geek Nights #51, Oxford Geek Night
#52. Originally scheduled for 15 April 2020 and then… postponed slightly because of the pandemic, its reapparance was an epic moment that I’m glad to have been a part of.
A particular highlight of the night was witnessing “Gasman”Matt Westcott show off his
epic demoscene contribution Pharmageddon, which is presented via a “pharmacy sign”. Here’s a video, if you’re interested.
Ben Foxall also put in a sterling performance; hearing him talk – as usual – made me say “wow, I didn’t know you could do that with a
web browser”. And there was more to learn, too: Jake Howard showed us how robots see, Steve Buckley inspired us to think about how technology can make our homes more energy-smart (this is really cool and sent me
down a rabbithole of reading!), and Joe Wass showed adorable pictures of his kid exploring the user interface of his lockdown electronics
project.
Oh, and there was a quiz competition too, and guess who came out on top after an incredibly tight race.
But mostly I just loved the chance to hang out with geeks again; chat to folks, make connections, and enjoy that special Oxford Geek Nights atmosphere. Also great to meet somebody from
Perspectum, who look like they’d be great to work for and – after hearing about – I had in mind somebody to suggest for a job with them… but it
looks like the company isn’t looking for anybody with their particular skills on this side of the pond. Still, one to watch.
My prize for winning the competition was an extremely-limited-edition cap which I love so much I’ve barely taken it off since.
Huge thanks are due to Torchbox, Perspectum and everybody in attendance for making this magical night possible!
Oh, and for anybody who’s interested, I’ve proposed to be a speaker at the next Oxford Geek Nights, which sounds like it’ll be towards Spring 2023. My title is
“Yesterday’s Internet, Today!” which – spoilers! – might have something to do with the kind of technology I’ve been playing with recently, among other things. Hope to see you there!
The finger protocol, first standardised way back in 1977, is a lightweight directory system
for querying resources on a local or remote shared system. Despite barely being used today, it’s so well-established that virtually every modern desktop operating system – Windows,
MacOS, Linux etc. – comes with a copy of finger, giving it a similar ubiquity to web browsers! (If you haven’t yet, give it a go.)
If you were using a shared UNIX-like system in the 1970s through 1990s, you might run finger to see who else was logged on at the same time as you, finger
chris to get more information about Chris, or finger alice@example.net to look up the details of Alice on the server example.net. Its ability to transcend the
boundaries of different systems meant that it was, after a fashion, an example of an early decentralised social network!
I first actively used finger when I was a student at Aberystwyth University. The shared central computers osfa and
osfb supported it in what was a pretty typical way: users could add a .plan and/or .project file to their home directory and the contents of these
would be output to anybody using finger to look up that user, along with other information like what department they belonged to. I’m simulating from memory so this won’t be remotely
accurate, but broadly speaking it looked a little like this –
$ finger dlq9@aber.ac.uk
Login: dlq9 Name: Dan Q
Directory: /users/9/d/dlq9 Department: Computer Science
Project:
Working on my BEng Software Engineering.
Plan:
_______
---' ____)____
______) Finger me!
_____)
(____)
---.__(___)
It’s not just about a directory of people, though: you could finger printers to see what their queues were like, finger a time server to ask what time it was,
finger a vending machine to see what drinks it
had available… even finger for a weather forecast where you are (this one still works as shown below; try it for your own location!) –
$ finger oxford@graph.no
-= Meteogram for Oxford, Oxfordshire, England, United Kingdom =-
'C Rain (mm)
12
11
10 ^^^=--=--
9^^^ ===
8 ^^^=== ====== ^^^
7 ====== ===============^^^ =--
6 =--=-----
5
4
3 | | | | | | | 1 mm
17 18 19 20 21 22 23 18/11 02 03 04 05 06 07_08_09_10_11_12_13_14 Hour
W W W W W W W W W W W W W W W W W W W W W W Wind dir.
6 6 7 7 7 7 7 7 6 6 6 5 5 4 4 4 4 5 6 6 5 5 Wind(m/s)
Legend left axis: - Sunny ^ Scattered = Clouded =V= Thunder # Fog
Legend right axis: | Rain ! Sleet * Snow
If you’d just like to play with finger, then finger.farm is a great starting point. They provide free finger hosting and they’re easy to use (try
finger dan@finger.farm to find me!). But I had something bigger in mind…
Fingering WordPress
What if you could fingermy blog. I.e. if you ran finger blog@danq.me you’d see a summary of some of my recent posts, along with additional
addresses you could finger to read the full content of each. This could be the world’s first finger-to-WordPress gateway; y’know, for
if you thought the world needed such a thing. Here’s how I did it:
Opened a hole in the firewall on port 79 so the outside world could access it (ufw allow 1965; utf reload).
The default configuration for efingerd acts like a “typical” finger server, but it’s highly programmable to make it “smarter”. I:
Blanked /etc/efingerd/list to prevent any output from “listing” the server (finger @danq.me).
Replaced the contents of /etc/efingerd/list and /etc/efingerd/nouser(which are run when a request matches, or doesn’t match, a user account name) with
a call to my script: /usr/local/bin/finger-to-wordpress "$3". $3 holds the username that was requested, so we can act on it.
Created /usr/local/bin/finger-to-wordpress – a Ruby program that either (a) lists a selection of posts or (b) returns a specific post (stripping the
HTML tags)
In future, I might use some extra tags or metadata to enhance finger-friendly WordPress posts. The infrastructure’s in place already (I already have tags that I use to make
certain kinds of content available only via certain media – shh!). You might rightly as what the point is of this entire enterprise, of course, and you’d be well within your
rights to ask such a question. But I think the best answer available is “because Dan”.
If you want to see my blog in a whole new way, give it a go: run finger blog@danq.me on your computer and follow the instructions.
Over the last three or four years I’ve undertaken a couple of different rounds of psychotherapy. I liken the experience to that of spotting constellations in the night sky.
That’s probably the result of the goal I stated when going in to the first round: I’d like you to help while I take myself apart, try to understand how I work, and then put myself
back together again.1I’m trying to connect the dots between who-I-once-was and who-I-am-now and find causal influences.
As I’m sure you can imagine: with an opening statement like that I needed to contact a few different therapists before I found one who was compatible with my aims2.
But then, I was always taught to get three quotes before hiring a professional.
Constellations are necessarily subjective. It’s always pleased me to think about how Orion the Hunter, one of the Northern hemisphere’s most-recognisable Winter visitors, was
interpreted by the Lakota people to represent a bison, and some Indian traditions see it as a deer.
It’s that “connecting the dots” that feels like constellation-spotting. A lot of the counselling work (and the “homework” that came afterwards) has stemmed from ideas like:
This star represents a moment in my past.
This star represents a facet of my identity today.
If we draw a line from one to the other, what does the resulting constellation look like?
I suppose that what I’ve been doing is using the lens of retrospection to ask: “Hey, why am I like this? Is this part of it? And what impact did that have on
me? Why can’t I see it?”
When you’re stargazing, sometimes you have to ask somebody to point out the shape in front of you before you can see it for yourself.
A better writer would make an allusion to looking into one’s past through the symbolism of looking into the Universe’s past, but I’m not that writer.
I haven’t yet finished this self-analytical journey, but I’m in an extended “homework” phase where I’m finding my own way: joining the dots for myself. Once somebody’s helped you find
those constellations that mean something to you, it’s easier to pick them out when you stargaze alone.
Footnotes
1 To nobody’s surprise whatsoever, I can reveal that ever since I was a child I’ve enjoyed
taking things apart to understand how they work. I wasn’t always so good at putting them back together again, though. My first alarm clock died that way, as did countless small
clockwork and electronic toys.
2 I also used my introductory contact to lay out my counselling qualifications,
in case they were a barrier for a potential therapist, but it turns out this wasn’t as much of a barrier as the fact that I arrived with a concrete mandate.
In the light of the so-called “Twitter migration”, I’ve spent a lot of the last week helping people new to Mastodon/the Fediverse in general to understand it. Or at least, to understand
how it’s different from Twitter.1
If you’re among those jumping ship, by the way, can I recommend that you do two things:
Don’t stop after reading an article about what Mastodon is and how it works (start here!); please also read about the established
etiquette, and
Don’t come in with the expectation that it’s “like Twitter but…”, because the ways it’s not like Twitter are more-important (and nobody wants it to be more like Twitter).
The tools, protocols and culture of the fediverse were built by trans and queer feminists. Those people had already started to feel sidelined from their own project when people like
me started turning up a few year ago. This isn’t the first time fediverse users have had to deal with a significant state change and feeling of loss. Nevertheless, the basic
principles have mostly held up to now: the culture and technical systems were deliberately designed on principles of consent, agency, and community safety.
…
If the people who built the fediverse generally sought to protect users, corporate platforms like Twitter seek to control their users… [Academics and advertisers] can claim that
legally Twitter has the right to do whatever it wants with this data, and ethically users gave permission for this data to be used in any way when they ticked “I
agree” to the Terms of Service.
…
This attitude has moved with the new influx. Loudly proclaiming that content warnings are censorship, that functionality that has been deliberately unimplemented due to community
safety concerns are “missing” or “broken”, and that volunteer-run servers maintaining control over who they allow and under what conditions are “exclusionary”. No consideration is
given to why the norms and affordances of Mastodon and the broader fediverse exist, and whether the actor they are designed to protect against might be you.
I genuinely believe that the fediverse is among our best bets for making a break from the silos of the corporate Web, and to do that it has to scale – it’s only the speed at which it’s
being asked to do so that’s problematic.
Aside from what I’m already doing – trying to tutor (tootor?) new fediversians about how to integrate in an appropriate and respectful manner and doing a little to supporting the
expansion of the software that makes it tick… I wonder what more I could/should be doing.
Would my effort be best-spent be running a server (one not-just-for-me, I mean: abnib.social, anyone?), or should I use that time and money to support existing instances
directly? Should I brush up on my ActivityPub spec so I can be a more-useful developer, or am I better-placed to focus on tending my own digital garden first? Or maybe I’m looking at it
all wrong and I should be trying to dissuade people from piling-on to a system that might well not be right for them (nor they for it!)?
I don’t know the answers to these questions, but I’m hoping to work them out soon.
Addendum
It only occurred to me after the fact that I should mention that you can find me at @dan@danq.me.
Footnotes
1 Important: I’m no expert. I’ve been doing fediverse things for about 3 years but I’m
relatively quiet on Mastodon. Also, I’ve never really understood or gotten along with Twitter, so I’m even less an expert on that. Don’t assume that I’m
an authority on anything at all, and especially not social media.
Your product, service, or organisation almost certainly has a priority of constituencies, even if it’s not written down or otherwise formally-encoded. A famous example would be that expressed in the Web Platform Design Principles. It dictates how you decide between two competing
needs, all other things being equal.
At Three Rings, for example, our priority of constituencies might1 look
like this:
The needs of volunteers are more important than
The needs of voluntary organisations, which are more important than
Continuation of the Three Rings service, which is more important than
Adherance to technical standards and best practice, which is more important than
Development of new features
These are all things we care about, but we’re talking about where we might choose to rank them, relative to one another.
The priorities and constituencies portrayed in this illustration are ficticious. Any resemblence to real priorities and constituencies, whether living or dead, is entirely
coincidental.
The priorities of an organisation you’re involved with won’t be the same: perhaps it includes shareholders, regulatory compliance, different kinds of end-users, employees, profits,
different measures of social good, or various measurable outputs. That’s fine: every system is different.
But what I’d challenge you to do is find ways to bisect your priorities. Invent scenarios that pit each constituency against itself another and discuss how they should
be prioritised, all other things being equal.
Using the example above, I might ask “which is more important?” in each category:
The needs of the volunteers developing Three Rings, or the needs of the volunteers who use it?
The needs of organisations that currently use the system, or the needs of organisations that are considering using it?
Achieving a high level of uptime, or promptly installing system updates?
Compliance with standards as-written, or maximum compatibility with devices as-used?
Implementation of new features that are the most popular user requests, or those which provide the biggest impact-to-effort payoff?
These might not be your answers to the same questions. They’re not even necessarily mine, and they’re even less-likely to be representative of Three Rings CIC. It’s just illustrative.
The aim of the exercise isn’t to come up with a set of commandments for your company. If you come up with something you can codify, that’s great, but if you and your stakeholders just
use it as an exercise in understanding the relative importance of different goals, that’s great too. Finding where people disagree is more-important than having a unifying
creed2.
And of course this exercise applicable to more than just organisational priorities. Use it for projects or standards. Use it for systems where you’re the only participant, as a thought
exercise. A priority of constituencies can be a beautiful thing, but you can understand it better if you’re willing to take it apart once in a while. Bisect your priorities, and see
what you find.
Footnotes
1 Three Rings doesn’t have an explicit priority of constituencies: the example I give is
based on my own interpretation, but I’m only a small part of the organisation.
I didn’t/don’t own much vinyl – perhaps mostly because I had a tape deck in my bedroom years before a record player – but I’ve felt this pain. And don’t get me started on the videogames I’ve paid for multiple times.
In the Summer of 1995 I bought the CD single of the (still excellent!) Set You
Free by N-Trance.2
I’d heard about this new-fangled “MP3” audio format, so soon afterwards I decided to rip a copy of the song to my PC.
I was using a 66MHz 486SX CPU, and without an embedded FPU I didn’t
quite have the spare processing power to rip-and-encode in a single pass.3
So instead I first ripped to an uncompressed PCM .wav file and then performed the encoding: the former step
was done almost in real-time (I listened to the track as it ripped!), about 7 minutes. The latter step took about 20 minutes.
So… about half an hour in total, to rip a single song.
Progress bar, you say? I’ll just sit here and wait then, I guess. Actual contemporary-ish photo.
Creating a (what would now be considered an apalling) 32kHz mono-channel file, this meant that I briefly stored both a 27MB wave file and the final ~4MB MP3 file. 31MB
might not sound huge, but I only had a total of 145MB of hard drive space at the time, so 31MB consumed over a fifth of my entire fixed storage! Even after deleting the intermediary wave file I was left with a single song consuming around 3% of my space,
which is mind-boggling to think about in hindsight.
But it felt like magic. I called my friend Gary to tell him about it. “This is going to be massive!” I said. At the time, I meant for techy
people: I could imagine a future in which, with more hard drive space, I’d keep all my music this way… or else bundle entire artists onto writable CDs in this new format, making albums obsolete. I never considered that over the coming decade or so the format would enter the public consciousness, let
alone that it’d take off like it did.
If you’re thinking of Gary and I as the kind of reprobates who helped bring on the golden age of music piracy… I’d like to distract you with a bigger show of yobbish behaviour in the
form of this photo from the day we played at dropping half-bricks onto starter pistol ammunition.
The MP3 file I produced had a fault. Most of the way through the encoding process, I got bored and ran another program, and this
must’ve interfered with the stream because there was an audible “blip” noise about 30 seconds from the end of the track. You’d have to be listening carefully to hear it, or else know
what you were looking for, but it was there. I didn’t want to go through the whole process again, so I left it.
But that artefact uniquely identified that copy of what was, in the end, a popular song to have in your digital music collection. As the years went by and I traded MP3 files in bulk at LAN parties or on CD-Rs or, on at least one ocassion, on an Iomega Zip disk (remember those?), I’d ocassionally see
N-Trance - (Only Love Can) Set You Free.mp34 being passed around and play it, to see if it was “my”
copy.
Sometimes the ID3 tags had been changed because for example the previous owner had decided it deserved to be considered Genre: Dance instead of Genre: Trance5. But I could still identify that file because
of the audio fingerprint, distinct to the first MP3 I ever created.
I still had that file when I went to university (where it occupied a smaller proportion of my hard drive space) and hearing that
distinctive “blip” would remind me about the ordeal that was involved in its creation. I don’t have it any more, but perhaps somebody else still does.
Footnotes
1 I might never have told this story on my blog, but eagle-eyed readers may remember that
I’ve certainly hinted at it before now.
2 Rewatching that music video, I’m struck by a recollection of how crazy popular
crossfades were on 1990s dance music videos. More than just a transition, I’m pretty sure that most of the frames of that video are mid-crossfade: it feels like I’m watching
Kelly Llorenna hanging out of a sunroof but I accidentally left one of my eyeballs in a smoky nightclub and can still see out of it as well.
3 I initially tried to convert directly from red book format to an MP3 file, but the encoding process was
too slow and the CD drive’s buffer filled up and didn’t get drained by the processor, which was still presumably bogged down with
framing or fourier-transforming earlier parts of the track. The CD drive reasonably assumed that it wasn’t actually being used and
spun-down the drive motor, and this caused it to lose its place in the track, killing the whole process and leaving me with about a 40 second recording.
4 Yes, that filename isn’t quite the correct title. I was wrong.
My day usually starts in my feed reader, accessed via the FeedMe app from my mobile (although FreshRSS provides a reasonably good
responsive interface out-of-the-box!)
But with FreshRSS 1.20.0, I no longer have to maintain my own tool to get this brilliant functionality, and I’m overjoyed. Let’s look at how it works by re-subscribing to Beverley’s
blog but without a middleware tool.
This post is about to get pretty technical. If you don’t want to learn some XPath but just want to make a feed out of a web page, use a
graphical tool like FetchRSS.
In the latest version of FreshRSS, when you add a new feed to your reader, a new section “Type of feed source” is available. Unfold it, and you can change from the default
(“RSS / Atom”) to the new option “HTML + XPath (Web scraping)”.
Put a human-readable page address rather than a feed address into the “Feed URL” field and fill these fields to tell FreshRSS
how to parse the page to get the content you want. Note that it doesn’t matter if the web page isn’t valid XML (e.g. missing
closing tags) because it’s going to get run through PHP’s
DOMDocument anyway which will “correct” for some really sloppy code if needed.
You can use your browser’s debugger to help check your XPath rules: here I’ve run document.evaluate('//li[@class="blog__post-preview"]', document).iterateNext() and
got back the first blog post on the page, so I know I’m on the right track.
You’ll need to use XPath to express how to find a “feed item” on the page. Here’s the rules I used for https://webdevbev.co.uk/blog.html (many of these fields were optional – I didn’t have to do this much work):
Feed title://h1
I override this anyway in FreshRSS, so I could just have used the a string, but I wanted the XPath practice. There’s only one <h1> on the page, and it can be
considered the “title” of the feed.
Finding items://li[@class="blog__post-preview"]
Each “post” on the page is an <li class="blog__post-preview">.
Item titles:descendant::h2
Each post has a <h2> which is the post title. The descendant:: selector scopes the search to each post as found above.
Item content:descendant::p[3]
Beverley’s static site generator template puts the post summary in the third paragraph of the <li>, which we can select like this.
Item link:descendant::h2/a/@href
This expects a URL, so we need the /@href to make sure we get the value of the <h2><a
href="...">, rather than its contents.
Item thumbnail:descendant::img[@class="blog__image--preview"]/@src
Again, this expects a URL, which we get from the <img src="...">.
Item author:"Beverley Newing"
Beverley’s blog doesn’t host any guest posts, so I just use a string literal here.
Item date:substring-after(descendant::p[@class="blog__date-posted"], "Date posted: ")
This is the only complicated one: the published dates on Beverley’s blog aren’t explicitly marked-up, but part of a string that begins with the words “Date posted: “, so I use XPath’s
substring-after function to strtip this. The result gets passed to PHP’s
strtotime(), which is pretty tolerant of different date formats (although not of the words “Date posted:” it turns out!).
I’d love one day for FreshRSS to provide some kind of “preview” feature here so you can see what you’ll expect to get back, as you work. That, and support for different input types
(JSON, perhaps?), perhaps other selectors (I find CSS-style
selectors much simpler than XPath), and maybe even an option to execute Javascript on the page before scraping (I use this in my own toolchain, but that’s just because I want to have
my cake and eat it too). But this is still all pretty awesome.
I hope that this is just the beginning for this new killer feature in FreshRSS: there’s so much more it can be and do. But for now, I’m still mighty impressed that I can begin to
phase-out my use of my relatively resource-intensive feed-building middleware and use my feed reader to do more and more of the heavy lifting for which I love it so much.
I also love that this functionally adds h-feed support in by the back door. I’d still prefer there to be a “h-feed” option in the “Type of feed source” drop-down, but at least
I can add such support manually, now!
The finished result: Bev’s blog posts appear directly in my feed reader, even though they don’t have a feed, and now without going through the middleware I’d set up for that
purpose.
Footnotes
1 When I say RSS, I mean feed. Most of the feeds I subscribe to are RSS feeds, but some
are Atom feeds, h-feed, etc. But I can’t get over the old-fashioned name, and I don’t care to try.
I managed to dodge infection for 922 days of the Covid pandemic1,
but it caught up with me eventually.
Well, shit.
Frankly, it’s surprising that it took this long. We’ve always been careful, in accordance with guidance at any given time, nd we all got our jabs and boosters as soon as we were able…
but conversely: we’ve got school-age children who naturally seem to be the biggest disease vectors imaginable. Our youngest, in fact, already had Covid, but the rest of us
managed to dodge it perhaps thanks to all these precautions.
The vaccine provide protection, but it’s not a magical force-field.
Luckily I’m not suffering too badly, probably thanks to the immunisation. It’s still not great, but I dread to think how it might have been without the benefit of the jab! A minor fever
came and went, and then it’s just been a few days of coughing, exhaustion, and… the most-incredible level of brain-fog.
Today, for example, I completey blanked the word “toilet” and struggled for some time to express to the dog why I’d brought her into the garden, while she stared at me expectantly.
I’ve taken the week off work to recover, which was a wise choice. As well as getting rest, it’s meant that I’ve managed to avoid writing production code with my addled brain! Instead,
I’ve spent a lot of time chilling in bed and watching all of the films that I’d been meaning to! This week, I’ve watched:
Peggy Sue Got Married (y’know, that other mid-1980s movie about time travel and being a teenager in the 1950s). It was okay; some bits of the direction were spectacular for its age,
like the “through the mirror” filming.
Fall. I enjoyed this more than I expected to. It’s not great, but while I spent most of the time complaining about the
lack of believability in the setting and the characters’ reactions, the acting was good and the tension “worked”: it was ocassionally pretty vertigo-inducing, and that’s not just
because I’ve been having some Covid-related dizziness!
RRR. Oh my god this Tollywood action spectacle was an adventure. At one point it’s a bromantic buddy comedy, then later
there’s a dance-off, then for a while there’s a wonderful “even language can’t divide us” romance, but then later a man picks up a motorcycle with one hand and uses it to beat up an
entire army, and somehow it all feels like it belongs together. The symbolism’s so thick you can spread it (tl;dr: colonialism
bad), but it’s still a riot of a film.
Cyrano, which I feel was under-rated but that could just be that I have a soft spot for the story… and a love of musical
theatre.
Also, at times when I didn’t think my brain had the focus for something new, I re-watched Dude, Where’s My Car? because
I figured a stoner comedy that re-replains the plot every 20 minutes or so was about as good as I could expect my brain to handle at the time, and Everything Everywhere All At Once which I’ve now seen three times and loved every single one: it’s one of my favourite films.
See, I’m fine! (Feel like I’ve spent a lot of time lying here, this week.)
Anyway: hopefully next week I’ll be feeling more normal and my poor Covid-struck brain can be trusted with code again. Until then: time to try to rest some more.
Footnotes
1 Based on the World Health Organisation’s declaration of the outbreak being a pandemic on
11 March 2020 and my positive test on 19 September 2022, I stayed uninfected for two years, six months, one week, and one day. But who’s counting?
That’s a really useful thing to have in this new age of the web, where Refererer: headers are no-longer commonly passed cross-domain and Google Search no longer provides the link: operator. If you want to know if I’ve ever
linked to your site, it’s a bit of a drag to find out.
To nobody’s surprise whatsoever, I’ve made a so many links to Wikipedia that I might be single-handedly responsible for their PageRank.
So, obviously, I’ve written an implementation for WordPress. It’s really basic right now, but the source code can be
found here if you want it. Install it as a plugin and run wp outbound-links to kick it off. It’s fast: it takes 3-5 seconds to parse the entirety of danq.me,
and I’ve got somewhere in the region of 5,000 posts to parse.
You can see the results at https://danq.me/.well-known/links – if you’ve ever wondered “has Dan ever linked to my site?”, now you can find the
answer.
If this could be useful to you, let’s collaborate on making this into an actually-useful plugin! Otherwise it’ll just languish “as-is”, which is good enough for my purposes.
I swear that I used to be good at Mastermind when I was a kid. But now, when it’s my turn to break
the code that one of our kids has chosen, I fail more often than I succeed. That’s no good!
If you didn’t have me pegged as a board gamer… where the hell have you been?
Mastermind and me
Maybe it’s because I’m distracted; multitasking doesn’t help problem-solving. Or it’s because we’re “Super” Mastermind, which differs from the one I had as a child in that
eight (not six) peg colours are available and secret codes are permitted to have duplicate peg colours. These changes increase the possible permutations from 360 to 4,096, but the
number of guesses allowed only goes up from 8 to 10. That’s hard.
The set I had as a kid was like this, I think. Photo courtesy ZeroOne; CC-BY-SA license.
Hey, that’s an idea. Let’s crack the code… by writing some code!
This online edition plays a lot like the version our kids play, although the peg colours are different. Next guess should be an
easy solve!
Representing a search space
The search space for Super Mastermind isn’t enormous, and it lends itself to some highly-efficient computerised storage.
There are 8 different colours of peg. We can express these colours as a number between 0 and 7, in three bits of binary, like this:
Decimal
Binary
Colour
0
000
Red
1
001
Orange
2
010
Yellow
3
011
Green
4
100
Blue
5
101
Pink
6
110
Purple
7
111
White
There are four pegs in a row, so we can express any given combination of coloured pegs as a 12-bit binary number. E.g. 100 110 111 010 would represent the
permutation blue (100), purple (110), white (111), yellow (010). The total search space, therefore, is the range of numbers from
000000000000 through 111111111111… that is: decimal 0 through 4,095:
Decimal
Binary
Colours
0
000000000000
Red, red, red, red
1
000000000001
Red, red, red, orange
2
000000000010
Red, red, red, yellow
…………
4092
111111111100
White, white, white, blue
4093
111111111101
White, white, white, pink
4094
111111111110
White, white, white, purple
4095
111111111111
White, white, white, white
Whenever we make a guess, we get feedback in the form of two variables: each peg that is in the right place is a bull; each that represents a peg in the secret code but
isn’t in the right place is a cow (the names come from Mastermind’s precursor, Bulls & Cows). Four bulls
would be an immediate win (lucky!), any other combination of bulls and cows is still valuable information. Even a zero-score guess is valuable- potentially very valuable! – because it
tells the player that none of the pegs they’ve guessed appear in the secret code.
If one of Wordle‘s parents was Scrabble, then this was the other. Just ask its Auntie Twitter.
Solving with Javascript
The latest versions of Javascript support binary literals and bitwise operations, so we can encode and decode between arrays of four coloured pegs (numbers 0-7) and the number 0-4,095
representing the guess as shown below. Decoding uses an AND bitmask to filter to the requisite digits then divides by the order of magnitude. Encoding is just a reduce
function that bitshift-concatenates the numbers together.
/** * Decode a candidate into four peg values by using binary bitwise operations. */function decodeCandidate(candidate){
return [
(candidate &0b111000000000) /0b001000000000,
(candidate &0b000111000000) /0b000001000000,
(candidate &0b000000111000) /0b000000001000,
(candidate &0b000000000111) /0b000000000001
];
}
/** * Given an array of four integers (0-7) to represent the pegs, in order, returns a single-number * candidate representation. */function encodeCandidate(pegs) {
return pegs.reduce((a, b)=>(a <<3) + b);
}
With this, we can simply:
Produce a list of candidate solutions (an array containing numbers 0 through 4,095).
Choose one candidate, use it as a guess, and ask the code-maker how it scores.
Eliminate from the candidate solutions list all solutions that would not score the same number of bulls and cows for the guess that was made.
Repeat from step #2 until you win.
Step 3’s the most important one there. Given a function getScore( solution, guess ) which returns an array of [ bulls, cows ] a given guess would
score if faced with a specific solution, that code would look like this (I’m convined there must be a more-performant way to eliminate candidates from the list with XOR
bitmasks, but I haven’t worked out what it is yet):
/** * Given a guess (array of four integers from 0-7 to represent the pegs, in order) and the number * of bulls (number of pegs in the guess that are in the right place) and cows (number of pegs in the * guess that are correct but in the wrong place), eliminates from the candidates array all guesses * invalidated by this result. Return true if successful, false otherwise. */function eliminateCandidates(guess, bulls, cows){
const newCandidatesList = data.candidates.filter(candidate=>{
const score = getScore(candidate, guess);
return (score[0] == bulls) && (score[1] == cows);
});
if(newCandidatesList.length ==0) {
alert('That response would reduce the candidate list to zero.');
returnfalse;
}
data.candidates = newCandidatesList;
chooseNextGuess();
returntrue;
}
I continued in this fashion to write a full solution (source code). It uses ReefJS for
component rendering and state management, and you can try it for yourself right in your web browser. If you
play against the online version I mentioned you’ll need to transpose the colours in your head: the physical version I play with the
kids has pink and purple pegs, but the online one replaces these with brown and black.
Testing the solution
Let’s try it out against the online version:
As expected, my code works well-enough to win the game every time I’ve tried, both against computerised and in-person opponents. So – unless you’ve been actively thinking about the
specifics of the algorithm I’ve employed – it might surprise you to discover that… my solution is very-much a suboptimal one!
My code has only failed to win a single game… and that turned out to because my opponent, playing overexcitedly, cheated in the third turn. To be fair, my code didn’t lose
either, though: it identified that a mistake must have been made and we declared the round void when we identified the problem.
My solution is suboptimal
A couple of games in, the suboptimality of my solution became pretty visible. Sure, it still won every game, but it was a blunt instrument, and anybody who’s seriously thought about
games like this can tell you why. You know how when you play e.g. Wordle (but not in “hard mode”) you sometimes want to type in a word that can’t possibly be the
solution because it’s the best way to rule in (or out) certain key letters? This kind of strategic search space bisection reduces the mean number of guesses you need to solve the
puzzle, and the same’s true in Mastermind. But because my solver will only propose guesses from the list of candidate solutions, it can’t make this kind of improvement.
My blog post about Break Into Us used a series of visual metaphors to show search space dissection, including this one. If you missed
it, it might be worth reading.
Search space bisection is also used in my adverserial hangman game, but in this case the aim is to split the search space in such a way that no
matter what guess a player makes, they always find themselves in the larger remaining portion of the search space, to maximise the number of guesses they have to make. Y’know, because
it’s evil.
A great first guess, assuming you’re playing against a random code and your rules permit the code to have repeated colours, is a “1122” pattern.
There are mathematically-derived heuristics to optimise Mastermind strategy. The first
of these came from none other than Donald Knuth (legend of computer science, mathematics, and pipe organs) back in 1977. His solution,
published at probably the height of the game’s popularity in the amazingly-named Journal of Recreational Mathematics, guarantees a solution to the six-colour version of the
game within five guesses. Ville [2013] solved an
optimal solution for a seven-colour variant, but demonstrated how rapidly the tree of possible moves grows and the need for early pruning – even with powerful modern computers – to
conserve memory. It’s a very enjoyable and readable paper.
But for my purposes, it’s unnecessary. My solver routinely wins within six, maybe seven guesses, and by nonchalantly glancing at my phone in-between my guesses I can now reliably guess
our children’s codes quickly and easily. In the end, that’s what this was all about.