I used to have a single minor niggle with the BBC News RSS feed: that it included sports news, which I didn’t care
about. So I wrote a script that downloaded it, stripped
sports news, and re-exported the feed for me to subscribe to. Magic.
Lately my BBC News feed has caused me some annoyance and frustration.
But lately – presumably as a result of technical changes at the Beeb’s side – this feed has found two fresh ways to annoy me:
The feed now re-publishes a story if it gets re-promoted to the front page… but with a different<guid> (it appears to get a #0 after it
when first published, a #1 the second time, and so on). In a typical day the feed reader might scoop up new stories about once an hour, any by the time I get to reading them the
same exact story might appear in my reader multiple times. Ugh.
They’ve started adding iPlayer and BBC Sounds content to the BBC News feed. I don’t follow BBC News in my feed reader because I want to watch or listen to things. If
you do, that’s fine, but I don’t, and I’d rather filter this content out.
Luckily, I already have a recipe for improving this feed, thanks to my prior work. Let’s look at my newly-revised script (also available on GitHub):
#!/usr/bin/env rubyrequire'bundler/inline'# # Sample crontab:# # At 41 minutes past each hour, run the script and log the results# */20 * * * * ~/bbc-news-rss-filter-sport-out.rb > ~/bbc-news-rss-filter-sport-out.log 2>>&1# Dependencies:# * open-uri - load remote URL content easily# * nokogiri - parse/filter XML
gemfile do
source 'https://rubygems.org'
gem 'nokogiri'endrequire'open-uri'# Regular expression describing the GUIDs to reject from the resulting RSS feed# We want to drop everything from the "sport" section of the website, also any iPlayer/Sounds linksREJECT_GUIDS_MATCHING=/^https:\/\/www\.bbc\.co\.uk\/(sport|iplayer|sounds)\//# Load and filter the original RSS
rss =Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
rss.css('item').select{|item| item.css('guid').text =~REJECT_GUIDS_MATCHING }.each(&:unlink)
# Strip the anchors off the <guid>s: BBC News "republishes" stories by using guids with #0, #1, #2 etc, which results in duplicates in feed readers
rss.css('guid').each{|g|g.content=g.content.gsub(/#.*$/,'')}
File.open( '/www/bbc-news-no-sport.xml', 'w' ){ |f| f.puts(rss.to_s) }
It’s amazing what you can do with Nokogiri and a half dozen lines of Ruby.
That revised script removes from the feed anything whose <guid> suggests it’s sports news or from BBC Sounds or iPlayer, and also strips any “anchor” part of the
<guid> before re-exporting the feed. Much better. (Strictly speaking, this can result in a technically-invalid feed by introducing duplicates, but your feed reader
oughta be smart enough to compensate for and ignore that: mine certainly is!)
You’re free to take and adapt the script to your own needs, or – if you don’t mind being tied to my opinions about what should be in BBC News’ RSS feed – just subscribe to my copy at: https://fox.q-t-a.uk/bbc-news-no-sport.xml
My employer Automattic‘s having a bit of a reorganisation. For unrelated reasons, this
coincides with my superteam having a bit of a reorganisation, too, and I’m going to be on a different team next week than I’ve been on for most of the 4+ years I’ve been
there1.
Together, these factors mean that I have even less idea than usual what I do for a living, right now.
What is it I do here again? Something something code WooCommerce something something marketplace awesome something, right?
On the whole, I approve of Matt‘s vision for this reorganisation. He writes:
Each [Automattic employee] gets a card: Be the Host, Help the Host, or Neutral.
You cannot change cards during the course of your day or week. If you do not feel aligned with your card, you need to change divisions within Automattic.
“Be the Host” folks are all about making Automattic’s web hosting offerings the best they possibly can be. These are the teams behind WordPress.com,
VIP, and Tumblr, for example. They’re making us competitive on the global stage. They bring Automattic money in a
very direct way, by making our (world class) hosting services available to our customers.
“Help the Host” folks (like me) are in roles that are committed to providing the best tools that can be used anywhere. You might run your copy of Woo, Jetpack, or (the client-side bit of) Akismet on Automattic infrastructure… or
alternatively you might be hosted by one of our competitors or even on your own hardware. What we bring to Automattic is more ethereal: we keep the best talent and expertise in these
technologies close to home, but we’re agnostic about who makes money out of what we create.
This stock photo confuses me so much that I had to use it. It’s WordPress, as seen in Chrome on Windows Vista… but running on a MacBook Air. The photographer has tried to blur their
site domain name (but it’s perfectly readable), but hasn’t concealed the fact they’re running µTorrent in the background (for Obviously Legal Reasons, I’m sure). Weird. But the
important thing is that, crazy as this person’s choices are, they can use Automattic’s software however they like. It’s cool.
Anyway: I love the clarification on the overall direction of the company… but I’m not sure how we market it effectively2. I look around at the people in my team and its sister
teams, all of us proudly holding our “Help the Hosts” cards and ready to work to continue to make Woo an amazing ecommerce platform wherever you choose to host it.
And obviously I can see the consumer value in that. It’s reassuring to know that the open source software we maintain or contribute to is the real deal and we’re not exporting a
cut-down version nor are we going to try to do some kind of rug pull to coerce people into hosting with us. I think Automattic’s long track record shows that.
But how do we sell that? How do we explain that “hey, you can trust us to keep these separate goals separate within our company, so there’s never a conflict
of interest and you getting the best from us is always what we want”? Personally, seeing the inside of Automattic, I’m convinced that we’re not – like so much of Big Tech –
going to axe the things you depend upon3
or change the terms and conditions to the most-exploitative we can get away with4 or support your business just long enough to be able to undermine and consume it 5.
In short: I know that we’re the “good guys”. And I can see how this reorganisation reinforces that. But I can’t for the life of me see how we persuade the rest of the world of the
fact6.
Any ideas?
Footnotes
1 I’ve been on Team Fire for a long while, which made my job title “Code Magician on Fire”, but now I’ll be on Team Desire which isn’t half as catchy a name but
I’m sure they’ll make up for it by being the kinds of awesome human beings I’ve become accustomed to working alongside at Automattic.
2 Fortunately they pay me to code, not to do marketing.
6 Seriously, it’s a good thing I’m not in marketing. I’d be so terrible at it. Also public
relations. Did I ever tell you the story about the time that, as a result of a mix-up, I accidentally almost gave an interview to the Press Office at the Vatican? A story for another
time, perhaps
This feels disappointingly like the prompt from day 2, but I’m gonna pivot it by letting my answer from three weeks ago only cover
one of the five points:
Code
Magic
Piano
Play
Learn
Let’s take a look at each of those, briefly.
Code
Code is poetry. Code is fun. Code is a many-splendoured thing.
When I’m not coding for work or coding as a volunteer, I’m often caught
coding for fun. Sometimes I write WordPress-ey things. Sometimes I write other random things. I tend to open-source almost everything I write, most
of it via my GitHub account.
Magic
Now I don’t work in the city centre nor have easy access to other magicians, I don’t perform as much magic as I used to. But I still try to keep my hand in and occasionally try new
things; I enjoy practicing sleights when I’m doing work-related things that don’t require my hands (meetings, code reviews, waiting for the damn unit tests to run…), a tip I learned
from fellow magician Andy.
My favourite go-to trick with an untampered deck of cards is my variant of the Ambitious Classic; here’s a bit from the middle of the trick from the last time I performed it in a
video meeting.
You’ll usually find a few decks of cards on my desk at any given time, mostly Bikes.1
Piano
I started teaching myself piano during the Covid lockdowns as a distraction from not being able to go anywhere (apparently I’m not the only one), and as an effort to do more of what I’m bad at.2
Since then, I’ve folded about ten minutes of piano-playing3,
give or take, into my routine virtually every day.
This is what piano playing looks like. But perhaps only barely.
I fully expect that I’ll never be as accomplished at it as, say, the average 8-year-old on YouTube, but that’s not what it’s about. If I take a break from programming, or meetings, or
childcare, or anything, I can feel that playing music exercises a totally different part of my mind. I’d heard musicians talk about such an experience before,
but I’d assumed that it was hyperbole… but from my perspective, they’re right: practicing an instrument genuinely does feel like using a part of your brain than you use for anything
else, which I love!
A lot of my RPG-gaming takes place online, via virtual tabletops, and is perhaps the most obvious “playtime” play activities I routinely engage in.
At the weekend I dusted off Vox Populi, my favourite mod for Civilization V, my favourite4
entry in the Civilization series, which in turn is one of my favourite video game series5. I don’t
get as much time for videogaming as I might like, but that’s probably for the best because a couple of hours disappeared on Sunday evening before I even blinked! It’s addictive
stuff.
Learn
As I mentioned back on day 3 of bloganuary, I’m a lifelong learner. But even when I’m not learning in an academic setting, I’m
doubtless learning something. I tend to alternate between fiction and non-fiction books on my bedside table. I often get lost on deep-dives through the depths of the Web after
a Wikipedia article makes me ask “wait, really?” And just sometimes, I set out to learn some kind of new skill.
In short: with such a variety of fun things lined-up, I rarely get the opportunity to be bored6!
Footnotes
1 I like the feel of Bicycle cards and the way they fan. Plus: the white border – which is
actually a security measure on playing cards designed to make bottom-dealing more-obvious and thus make it harder for people to cheat at e.g. poker – can actually be turned to work
for the magician when doing certain sleights, including one seen in the mini-video above…
2 I’m not strictly bad at it, it’s just that I had essential no music tuition or
instrument experience whatsoever – I didn’t even have a recorder at primary school! – and so I was starting at square zero.
3 Occasionally I’ll learn a bit of a piece of music, but mostly I’m trying to improve my
ability to improvise because that scratches an itch in a part of my brain in a way that I find most-interesting!
4 Games in the series I’ve extensively played include: Civilization,
CivNet, Civilization II (also Test of Time), Alpha Centauri (a game so good I paid for it three times, despite having previously pirated
it), Civilization III, Civilization IV, Civilization V, Beyond Earth (such a disappointment compared to SMAC) and Civilization VI, plus all their expansions except for the very latest one for VI. Also spinoffs/clones
FreeCiv, C-Evo, and both Call to Power games. Oh, and at least two of the board games. And that’s just the ones I’ve played enough to talk in detail about:
I’m not including things like Revolution which I played an hour of and hated so much I shan’t touch it again, nor either version of Colonization which I’m
treating separately…
Now I’ve added support for Spartan3 too and, seeing as the implementations shared functionality, I’ve
combined all three – Gemini, Spartan, and Gopher – into a single package: CapsulePress.
CapsulePress is a Gemini/Spartan/Gopher to WordPress bridge. It lets you use WordPress as a CMS for any or all of
those three non-Web protocols in addition to the Web.
For example, that means that this post is available on all of:
It’s also possible to write posts that selectively appear via different media: if I want to put something exclusively on my gemlog, I can, by assigning metadata that
tells WordPress to suppress a post but still expose it to CapsulePress. Neat!
Using Gemini and friends in the 2020s make me feel like the dream of the Internet of the nineties and early-naughties is still alive. But with fewer banner ads.
I’ve open-sourced the whole thing under a super-permissive license, so if you want your own WordPress blog to “feed” your Gemlog… now you can. With a few caveats:
It’s hard to use. While not as hacky as the disparate piles of code it replaced, it’s still not the cleanest. To modify it you’ll need a basic comprehension of all
three protocols, plus Ruby, SQL, and sysadmin skills.
It’s super opinionated. It’s very much geared towards my use case. It’s improved by the use of templates. but it’s still probably only suitable for this
site for the time being, until you make changes.
It’s very-much unfinished. I’ve got a growing to-do list, which should
be a good clue that it’s Not Finished. Maybe it never will but. But there’ll be changes yet to come.
Whether or not your WordPress blog makes the jump to Geminispace4, I hope you’ll came take a look at mine at one of the URLs linked above,
and then continue to explore.
If you’re nostalgic for the interpersonal Internet – or just the idea of it, if you’re too young to remember it… you’ll find it there. (That Internet never actually went away,
but it’s harder to find on today’s big Web than it is on lighter protocols.)
I’ve made a handful of tweaks to my RSS feed which I feel improves upon
WordPress’s default implementation, at least in my use-case.1 In case any of these improvements help
you, too, here’s a list of them:
Post Kinds in Titles
Since 2020, I’ve decorated post titles by prefixing them with the “kind” of post they are (courtesy of the Post Kinds
plugin). I’ve already written about how I do it, if you’re
interested.
Identifying post kinds is particularly useful for people who subscribe by
email (the emails are generated off the RSS feed either daily or weekly: subscriber’s choice), who might want to see
articles and videos but not care about for example checkins and reposts.
RSS Only posts
A minority of my posts are – initially, at least – publicised only via my RSS feed (and places that are directly fed
by it, like email subscribers). I use a tag to identify posts to be hidden in this way. I’ve
written about my implementation before, but I’ve since made a couple of additional improvements:
Suppressing the tag from tag clouds, to make it harder to accidentally discover these posts by tag-surfing,
Tweaking the title of such posts when they appear in feeds (using the same technique as above), so that readers know when they’re seeing “exclusive” content, and
Setting a X-Robots-Tag: noindex, nofollow HTTP header when viewing such tag or a post, to discourage
search engines (code for this not shown below because it’s so very specific to my theme that it’s probably no use to anybody else!).
// 1. Suppress the "rss club" tag from tag clouds/the full tag listfunctionrss_club_suppress_tags_from_display( string $tag_list, string $before, string $sep, string $after, int $post_id ): string {
foreach(['rss-club'] as$tag_to_suppress){
$regex=sprintf( '/<li>[^<]*?<a [^>]*?href="[^"]*?\/%s\/"[^>]*?>.*?<\/a>[^<]*?<\/li>/', $tag_to_suppress );
$tag_list=preg_replace( $regex, '', $tag_list );
}
return$tag_list;
}
add_filter( 'the_tags', 'rss_club_suppress_tags_from_display', 10, 5 );
// 2. In feeds, tweak title if it's an RSS exclusivefunctionrss_club_add_rss_only_to_rss_post_title( $title ){
$post_tag_slugs=array_map(function($tag){ return$tag->slug; }, wp_get_post_tags( get_the_ID() ));
if ( !in_array( 'rss-club', $post_tag_slugs ) ) return$title; // if we don't have an rss-club tag, drop out herereturn trim( "{$title} [RSS Exclusive!]" );
return$title;
}
add_filter( 'the_title_rss', 'rss_club_add_rss_only_to_rss_post_title', 6 );
Adding a stylesheet
Adding a stylesheet to your feeds can make them much friendlier to beginner users (which helps drive adoption) without making them much less-convenient for people who know how
to use feeds already. Darek Kay and Terence Eden both wrote great articles about this just
earlier this year, but I think my implementation goes a step further.
In addition to adding some “Q” branding, I made tweaks to make it work seamlessly with both my RSS and Atom feeds by using
two<xsl:for-each> blocks and exploiting the fact that the two standards don’t overlap in their root namespaces. Here’s my full XSLT; you need to
override your feed template as Terence describes to use it, but mine can be applied to both RSS and Atom.2
I’ve still got more I’d like to do with this, for example to take advantage of the thumbnail images I attach to posts. On which note…
Thumbnail images
When I first started offering email subscription options I used Mailchimp’s RSS-to-email service, which was… okay,
but not great, and I didn’t like the privacy implications that came along with it. Mailchimp support adding thumbnails to your email template from your feed, but WordPress themes don’t
by-default provide the appropriate metadata to allow them to do that. So I installed Jordy Meow‘s RSS Featured Image plugin which did it for me.
<item><title>[Checkin] Geohashing expedition 2023-07-27 51 -1</title><link>https://danq.me/2023/07/27/geohashing-expedition-2023-07-27-51-1/</link>
...
<media:contenturl="/_q23u/2023/07/20230727_141710-1024x576.jpg"medium="image"/><media:description>Dan, wearing a grey Three Rings hoodie, carrying French Bulldog Demmy, standing on a path with trees in the background.</media:description></item>
Media attachments for RSS feeds are perhaps most-popular for podcasts, but they’re also great for post thumbnail images.
During my little redesign earlier this year I decided to go two steps further: (1) ditching the
plugin and implementing the functionality directly into my theme (it’s really not very much code!), and (2) adding not only a <media:content medium="image" url="..."
/> element but also a <media:description> providing the default alt-text for that image. I don’t know if any feed readers (correctly) handle this
accessibility-improving feature, but my stylesheet above will, some day!
So there we have it: a little digital gardening, and four improvements to WordPress’s default feeds.
RSS may not be as hip as it once was, but little improvements can help new users find their way into this (enlightened?) way
to consume the Web.
If you’re using RSS to follow my blog, great! If it’s not for you, perhaps pick your favourite alternative way to get updates, from options including email, Telegram, the Fediverse (e.g. Mastodon), and more…
1 The changes apply to the Atom
feed too, for anybody of such an inclination. Just assume that if I say RSS I’m including Atom, okay?
2 The experience of writing this transformation/stylesheet also gave me yet another opportunity to remember how much I hate working
with XSLTs. This time around, in addition to the normal namespace issues and headscratching syntax, I
had to deal with the fact that I initially tried to use a feature from XSLT version 2.0 (a 22-year-old
version) only to discover that all major web browsers still only support version 1.0 (specified last millenium)!
Among the many perks of working for a company with a history so tightly-intertwined with that of the open-source WordPress project is that license to attend WordCamps – the biggest WordPress conferences – is basically a
given.
It’s frankly a wonder that this is, somehow, my first WordCamp. As well as using it1 and developing atop
it2,
of course, I’ve been contributing to WordPress since 2004 (albeit only in a tiny way, and not at all for most of the last decade!).
If you already know what WP-CLI is… let’s be friends.
Today is Contributor Day, a pre-conference day in which folks new and old get together in person to hack on WordPress and WordPress-adjacent projects. So I met up with Cem, my Level 4 Dragonslayer friend, and we took an ultra-brief induction into WP-CLI3
before diving in to try to help write some code.
Contributor Days are about many things, but perhaps their biggest value comes from lowering the barrier to becoming a new contributor to an open-source project by sitting you
right next to somebody who already knows it well.
So today, as well as meeting some awesome folks, I got to write an overly-verbose justification for a
bug report being invalid and implement my first PR for WP-CLI: a bugfix for a strange quirk in output formatting.
The bug I fixed is slightly hard to describe (and even harder to explain why it matters), but here’s a summary: when you run a WP-CLI command that first displays a table and
then the result, the result is likely to always appear in colour even if you specify --no-color.
I hope to be able to continue contributing to WP-CLI. I learned a lot about it today, and while I don’t use it as much as I used to in my multisite-management days, I still really
respect its power as a tool.
Did I mention lately how awesome my employers are? I promise my blog’s not always gonna be me shilling for them… but today it is.
Footnotes
1 Even with the monumental stack of custom code woven into DanQ.me, a keen eye will
probably spot that it’s WordPress-powered.
3 WP-CLI is… it’s like Drush but for WordPress, if that makes sense to you? If not: it’s a
multifaceted command-line tool for installing, configuring, maintaining, and managing WordPress installations, and I’ve been in love with it for years.
There’s been a bit of a resurgence lately of sites whose only subscription option is email, or – worse yet – who provide certain “exclusive” content only to email subscribers.
I don’t want to go giving an actual email address to every damn service, because:
It’s not great for privacy, even when (as usual) I use a unique alias for each sender.
It’s usually harder to unsubscribe than I’d like, and rarely consistent: you need to find a recent message, click a link, sometimes that’s enough or sometimes you need to uncheck a
box or click a button, or sometimes you’ll get another email with something to click in it…
I rarely want to be notified the very second a new issue is published; email is necessarily more “pushy” than I like a subscription to be.
I don’t want to use my email Inbox to keep track of which articles I’ve read/am still going to read: that’s what a feed reader is for! (It also provides tagging, bookmarking,
filtering, standardised and bulk unsubscribing tools, etc.)
So what do I do? Well…
I already operate an OpenTrashMail instance for one-shot throwaway email addresses (which I highly recommend). And
OpenTrashMail provides a rich RSS feed. Sooo…
How I subscribe to newsletters (in my feed reader)
If I want to subscribe to your newsletter, here’s what I do:
Put an email address (I usually just bash the keyboard to make a random one, then put @-a-domain-I-control on the end, where that domain is handled by OpenTrashMail) in to
subscribe.
Put https://my-opentrashmail-server/rss/the-email-address-I-gave-you/rss.xml into my feed reader.
That’s all. There is no step 3.
Now I get your newsletter alongside all my other subscriptions. If I want to unsubscribe I just tell my feed reader to stop polling the RSS feed (You don’t even get to find out that I’ve unsubscribed; you’re now just dropping emails into an unmonitored box, but of course I can
resubscribe and pick up from where I left off if I ever want to).
Obviously this approach isn’t suitable for personalised content or sites for which your email address is used for authentication, because anybody who can guess the random email address
can get the feed! But it’s ideal for those companies who’ll ocassionally provide vouchers in exchange for being able to send you other stuff to your Inbox, because you can
simply pipe their content to your feed reader, then add a filter to drop anything that doesn’t contain the magic keyword: regular vouchers, none of the spam. Or for blogs that provide
bonus content to email subscribers, you can get the bonus content in the same way as the regular content, right there in a folder of your reader. It’s pretty awesome.
If you don’t already have and wouldn’t benefit from running OpenTrashMail (or another trashmail system with feed support) it’s probably not worth setting one up just for this
purpose. But otherwise, I can certainly recommend it.
I’m off work sick today: it’s just a cold, but it’s had a damn good go at wrecking my lungs and I feel pretty lousy. You know how when you’ve got too much of a brain-fog to trust
yourself with production systems but you still want to write code (or is that just me?), so this morning I threw together a really,
really stupid project which you can play online here.
It’s a board game. Well, the digital edition of one. Also, it’s not very good.
It’s inspired by a toot by Mason”Tailsteak” Williams (whom I’ve mentioned before once or
twice). At first I thought I’d try to calculate the odds of winning at his proposed game, or how many times one might expect to play before winning,
but I haven’t the brainpower for that in my snot-addled brain. So instead I threw together a terrible, terrible digital implementation.
Go play it if, like me, you’ve got nothing smarter that your brain can be doing today.
Finally got around to implementing a super-lightweight (~20 lines of code, 1 dependency) #spring83 key generator. There are plenty of others; nobody needs this one, but it’s free if you
want it:
That’s a really useful thing to have in this new age of the web, where Refererer: headers are no-longer commonly passed cross-domain and Google Search no longer provides the link: operator. If you want to know if I’ve ever
linked to your site, it’s a bit of a drag to find out.
To nobody’s surprise whatsoever, I’ve made a so many links to Wikipedia that I might be single-handedly responsible for their PageRank.
So, obviously, I’ve written an implementation for WordPress. It’s really basic right now, but the source code can be
found here if you want it. Install it as a plugin and run wp outbound-links to kick it off. It’s fast: it takes 3-5 seconds to parse the entirety of danq.me,
and I’ve got somewhere in the region of 5,000 posts to parse.
You can see the results at https://danq.me/.well-known/links – if you’ve ever wondered “has Dan ever linked to my site?”, now you can find the
answer.
If this could be useful to you, let’s collaborate on making this into an actually-useful plugin! Otherwise it’ll just languish “as-is”, which is good enough for my purposes.
I swear that I used to be good at Mastermind when I was a kid. But now, when it’s my turn to break
the code that one of our kids has chosen, I fail more often than I succeed. That’s no good!
If you didn’t have me pegged as a board gamer… where the hell have you been?
Mastermind and me
Maybe it’s because I’m distracted; multitasking doesn’t help problem-solving. Or it’s because we’re “Super” Mastermind, which differs from the one I had as a child in that
eight (not six) peg colours are available and secret codes are permitted to have duplicate peg colours. These changes increase the possible permutations from 360 to 4,096, but the
number of guesses allowed only goes up from 8 to 10. That’s hard.
The set I had as a kid was like this, I think. Photo courtesy ZeroOne; CC-BY-SA license.
Hey, that’s an idea. Let’s crack the code… by writing some code!
This online edition plays a lot like the version our kids play, although the peg colours are different. Next guess should be an
easy solve!
Representing a search space
The search space for Super Mastermind isn’t enormous, and it lends itself to some highly-efficient computerised storage.
There are 8 different colours of peg. We can express these colours as a number between 0 and 7, in three bits of binary, like this:
Decimal
Binary
Colour
0
000
Red
1
001
Orange
2
010
Yellow
3
011
Green
4
100
Blue
5
101
Pink
6
110
Purple
7
111
White
There are four pegs in a row, so we can express any given combination of coloured pegs as a 12-bit binary number. E.g. 100 110 111 010 would represent the
permutation blue (100), purple (110), white (111), yellow (010). The total search space, therefore, is the range of numbers from
000000000000 through 111111111111… that is: decimal 0 through 4,095:
Decimal
Binary
Colours
0
000000000000
Red, red, red, red
1
000000000001
Red, red, red, orange
2
000000000010
Red, red, red, yellow
…………
4092
111111111100
White, white, white, blue
4093
111111111101
White, white, white, pink
4094
111111111110
White, white, white, purple
4095
111111111111
White, white, white, white
Whenever we make a guess, we get feedback in the form of two variables: each peg that is in the right place is a bull; each that represents a peg in the secret code but
isn’t in the right place is a cow (the names come from Mastermind’s precursor, Bulls & Cows). Four bulls
would be an immediate win (lucky!), any other combination of bulls and cows is still valuable information. Even a zero-score guess is valuable- potentially very valuable! – because it
tells the player that none of the pegs they’ve guessed appear in the secret code.
If one of Wordle‘s parents was Scrabble, then this was the other. Just ask its Auntie Twitter.
Solving with Javascript
The latest versions of Javascript support binary literals and bitwise operations, so we can encode and decode between arrays of four coloured pegs (numbers 0-7) and the number 0-4,095
representing the guess as shown below. Decoding uses an AND bitmask to filter to the requisite digits then divides by the order of magnitude. Encoding is just a reduce
function that bitshift-concatenates the numbers together.
/** * Decode a candidate into four peg values by using binary bitwise operations. */function decodeCandidate(candidate){
return [
(candidate &0b111000000000) /0b001000000000,
(candidate &0b000111000000) /0b000001000000,
(candidate &0b000000111000) /0b000000001000,
(candidate &0b000000000111) /0b000000000001
];
}
/** * Given an array of four integers (0-7) to represent the pegs, in order, returns a single-number * candidate representation. */function encodeCandidate(pegs) {
return pegs.reduce((a, b)=>(a <<3) + b);
}
With this, we can simply:
Produce a list of candidate solutions (an array containing numbers 0 through 4,095).
Choose one candidate, use it as a guess, and ask the code-maker how it scores.
Eliminate from the candidate solutions list all solutions that would not score the same number of bulls and cows for the guess that was made.
Repeat from step #2 until you win.
Step 3’s the most important one there. Given a function getScore( solution, guess ) which returns an array of [ bulls, cows ] a given guess would
score if faced with a specific solution, that code would look like this (I’m convined there must be a more-performant way to eliminate candidates from the list with XOR
bitmasks, but I haven’t worked out what it is yet):
/** * Given a guess (array of four integers from 0-7 to represent the pegs, in order) and the number * of bulls (number of pegs in the guess that are in the right place) and cows (number of pegs in the * guess that are correct but in the wrong place), eliminates from the candidates array all guesses * invalidated by this result. Return true if successful, false otherwise. */function eliminateCandidates(guess, bulls, cows){
const newCandidatesList = data.candidates.filter(candidate=>{
const score = getScore(candidate, guess);
return (score[0] == bulls) && (score[1] == cows);
});
if(newCandidatesList.length ==0) {
alert('That response would reduce the candidate list to zero.');
returnfalse;
}
data.candidates = newCandidatesList;
chooseNextGuess();
returntrue;
}
I continued in this fashion to write a full solution (source code). It uses ReefJS for
component rendering and state management, and you can try it for yourself right in your web browser. If you play against the online version I mentioned you’ll need to transpose the colours in your head: the physical version I play with the kids has pink and
purple pegs, but the online one replaces these with brown and black.
Testing the solution
Let’s try it out against the online version:
As expected, my code works well-enough to win the game every time I’ve tried, both against computerised and in-person opponents. So – unless you’ve been actively thinking about the
specifics of the algorithm I’ve employed – it might surprise you to discover that… my solution is very-much a suboptimal one!
My code has only failed to win a single game… and that turned out to because my opponent, playing overexcitedly, cheated in the third turn. To be fair, my code didn’t lose
either, though: it identified that a mistake must have been made and we declared the round void when we identified the problem.
My solution is suboptimal
A couple of games in, the suboptimality of my solution became pretty visible. Sure, it still won every game, but it was a blunt instrument, and anybody who’s seriously thought about
games like this can tell you why. You know how when you play e.g. Wordle (but not in “hard mode”) you sometimes want to type in a word that can’t possibly be the
solution because it’s the best way to rule in (or out) certain key letters? This kind of strategic search space bisection reduces the mean number of guesses you need to solve the
puzzle, and the same’s true in Mastermind. But because my solver will only propose guesses from the list of candidate solutions, it can’t make this kind of improvement.
My blog post about Break Into Us used a series of visual metaphors to show search space dissection, including this one. If you missed
it, it might be worth reading.
Search space bisection is also used in my adverserial hangman game, but in this case the aim is to split the search space in such a way that no
matter what guess a player makes, they always find themselves in the larger remaining portion of the search space, to maximise the number of guesses they have to make. Y’know, because
it’s evil.
A great first guess, assuming you’re playing against a random code and your rules permit the code to have repeated colours, is a “1122” pattern.
There are mathematically-derived heuristics to optimise Mastermind strategy. The first
of these came from none other than Donald Knuth (legend of computer science, mathematics, and pipe organs) back in 1977. His solution,
published at probably the height of the game’s popularity in the amazingly-named Journal of Recreational Mathematics, guarantees a solution to the six-colour version of the
game within five guesses. Ville [2013] solved an
optimal solution for a seven-colour variant, but demonstrated how rapidly the tree of possible moves grows and the need for early pruning – even with powerful modern computers – to
conserve memory. It’s a very enjoyable and readable paper.
But for my purposes, it’s unnecessary. My solver routinely wins within six, maybe seven guesses, and by nonchalantly glancing at my phone in-between my guesses I can now reliably guess
our children’s codes quickly and easily. In the end, that’s what this was all about.
Different games in the same style (absurdle plays adversarially like my cheating hangman
game, crosswordle involves reverse-engineering a wordle colour grid into a crossword, heardle
is like Wordle but sounding out words using the IPA…)
I’m sure that by now all your social feeds are full of people playing Wordle. But the cool nerds are playing something new…
Now, a Wordle clone for D&D players!
But you know what hasn’t been seen before today? A Wordle clone where you have to guess a creature from the Dungeons & Dragons (5e) Monster Manual by putting numeric values into a
character sheet (STR, DEX, CON, INT, WIS, CHA):
Just because nobody’s asking for a game doesn’t mean you shouldn’t make it anyway.
What are you waiting for: go give DNDle a try (I pronounce it “dindle”, but you can pronounce it however you like). A new monster
appears at 10:00 UTC each day.
And because it’s me, of course it’s open source and works offline.
The boring techy bit
Like Wordle, everything happens in your browser: this is a “backendless” web application.
I’ve used ReefJS for state management, because I wanted something I could throw together quickly but I didn’t want to drown myself (or my players)
in a heavyweight monster library. If you’ve not used Reef before, you should give it a go: it’s basically like React but a tenth of the footprint.
A cache-first/background-updating service worker means that it can run completely offline: you can install it to your homescreen in the
same way as Wordle, but once you’ve visited it once it can work indefinitely even if you never go online again.
I don’t like to use a buildchain that’s any more-complicated than is absolutely necessary, so the only development dependency is rollup. It
resolves my import statements and bundles a single JS file for the browser.
A not-entirely-theoretical question about open source software licensing came up at work the other day. I thought it was interesting
enough to warrant a quick dive into the philosophy of minification, and how it relates to copyleft open source licenses. Specifically: does distributing (only) minified
source code violate the GPL?
If you’ve come here looking for a legally-justifiable answer to that question, you’re out of luck. But what I can give you is a (fictional) story:
TheseusJS is slow
TheseusJS is a (fictional) Javascript library designed to be run in a browser. It’s released under the GPLv3 license. This license allows you to download and use TheseusJS for any purpose you like, including making money off it, modifying
it, or redistributing it to others… but it requires that if you redistribute it you have to do so under the same license and include the source code. As such, it forces you to
share with others the same freedoms you enjoy for yourself, which is highly representative of some schools of open-source thinking.
It’s a cool project, but it really needs some maintenance this side of 2010.
It’s a great library and it’s used on many websites, but its performance isn’t great. It’s become infamous for the impact it has on the speed of the websites it’s used on, and it’s
often the butt of jokes by developers: “Man, this website’s slow. Must be running Theseus!”
The original developer has moved onto his new project, Moralia, and seems uninterested in handling the growing number of requests for improvements. So I’ve decided to fork it
and make my own version, FastTheseusJS and work on improving its speed.
FastTheseusJS is fast
I do some analysis and discover the single biggest problem with TheseusJS is that the Javascript file itself is enormous. The original developer kept all of the
copious documentation in comments in the file itself, and for some reason it doesn’t even compress well. When you use TheseusJS on a website it takes a painfully long time for
a browser to download it, if it’s not precached.
Nobody even uses the documentation in the comments: there’s a website with a fully-documented API.
My first release of FastTheseusJS, then, removes virtually of the comments, replacing them with a single comment at the top pointing developers to a website where the
API is fully documented. While I’m in there anyway, I also fix a minor bug that’s been annoying me for a while.
v1.1.0 changes
Forked from TheseusJS v1.0.4
Fixed issue #1071 (running mazeSolver() without first connecting <String> component results in endless loop)
Removed all comments: improves performance considerably
I discover another interesting fact: the developer of TheseusJS used a really random mixture of tabs and spaces for indentation, sometimes in the same line! It looks…
okay if you set your editor up just right, but it’s pretty hideous otherwise. That whitespace is unnecessary anyway: the codebase is sprawling but it seldom goes more than two
levels deep, so indentation levels don’t add much readability. For my second release of FastTheseusJS, then, I remove this extraneous whitespace, as well as removing
the in-line whitespace inside parameter lists and the components of for loops. Every little helps, right?
v1.1.1 changes
Standardised whitespace usage
Removed unnecessary whitespace
Some of the simpler functions now fit onto just a single line, and it doesn’t even inconvenience me to see them this way: I know the codebase well enough by now that it’s no
disadvantage for me to edit it in this condensed format.
Personally, I’ve given up on the tabs-vs-spaces debate and now I indent my code using semicolons. (That’s clearly a joke. Don’t flame me.)
In the next version, I shorten the names of variables and functions in the code.
For some reason, the original developer used epic rambling strings for function names, like the well-known function
dedicateIslandTempleToTheImageOfAGodBeforeOrAfterMakingASacrificeWithOrWithoutDancing( boolBeforeMakingASacrifice, objectImageOfGodToDedicateIslandTempleTo,
stringNmeOfPersonMakingDedication, stringOrNullNameOfLocalIslanderDancedWith). That one gets called all the time internally and isn’t exposed via the external
API so it might as well be shortened to d=(i,j,k,l,m)=>. Now all the internal workings of the library
are each represented with just one or two letters.
v1.1.2 changes
Shortened/standarised non-API variable and function names – improves performance
I’ve shaved several kilobytes off the monstrous size of TheseusJS and I’m very proud. The original developer says nice things about my fork on social media, resulting in a
torrent of downloads and attention. Within a certain archipelago of developers, I’m slightly famous.
But did I violate the license?
But then a developer says to me: you’re violating the license of the original project because you’re not making the source code available!
This happens every day. Probably not to this same guy every time though, but you never know. Original photo by Andrea Piacquadio.
They claim that my bugfix in the first version of FastTheseusJS represents a material change to the software, and that the changes I’ve made since then are
obfuscation: efforts short of binary compilation that aim to reduce the accessibility of the source code. This fails to meet the GPL‘s definition of source code as “the preferred form of the work for making modifications to
it”. I counter that this condensed view of the source code is my “preferred” way of working with it, and moreover that my output is not the result of some build step that
makes the code harder to read, the code is just hard to read as a result of the optimisations I’ve made. In ambiguous cases, whose “preference” wins?
Did I violate the license? My gut feeling is that no, all of my changes were within the spirit and the letter of the GPL (they’re a
terrible way to write code, but that’s not what’s in question here). Because I manually condensed the code, did so with the intention that this condensing was a feature, and
continue to work directly with the code after condensing it because I prefer it that way… that feels like it’s “okay”.
But if I’d just run the code through a minification tool, my opinion changes. Suppose I’d run minify --output fasttheseus.js theseus.js and then deleted my copy of
theseus.js. Then, making changes to fasttheseus.js and redistributing it feels like a violation to me… even if the resulting code is the same as I’d have
gotten via the “manual” method!
I don’t know the answer (IANAL), but I’ll tell you this: I feel hypocritical for saying one piece of code would not violate
the license but another identical piece of code would, based only on the process the developer followed to produce it. If I replace one piece of code at a time with
less-readable versions the license remains intact, but if I replace them all at once it doesn’t? That doesn’t feel concrete nor satisfying.
Sure, I can write a blog post in just one line of code. It’ll just be a really, really, really long line… (Still perfectly readable, though!)
This isn’t an entirely contrived example
This example might seem highly contrived, and that’s because it is. But the grey area between the extremes is where the real questions are. If you agree that redistribution of (only)
minified source code violates the GPL, you’re left asking: at what point does the change occur? Code isn’t necessarily minified or
not-minified: there are many intermediate steps.
If I use a correcting linter to standardise indentation and whitespace – switching multiple spaces for the appropriate number of tabs, removing excess line breaks etc. (or do the same
tasks manually) I’m sure you’d agree that’s fine. If I have it replace whole-function if-blocks with hoisted return statements, that’s probably fine too. If I replace if blocks with
ternery operators or remove or shorten comments… that might be fine, but probably depends upon context. At some point though, some way along the process, minification goes “too
far” and feels like it’s no longer within the limitations of the license. And I can’t tell you where that point is!
This issue’s even more-complicated with some other licenses, e.g. the AGPL, which extends the requirement to share source code to hosted applications. Suppose I implement a web application that uses an AGPL-licensed library. The person who redistributed it to me only gave me the minified version, but they gave me a web address from which
to acquire the full source code, so they’re in the clear. I need to make a small patch to the library to support my service, so I edit it right into the minified version I’ve already
got. A user of my hosted application asks for a copy of the source code, so I provide it, including the edited minified library… am I violating the license for not providing the full,
unminified version, even though I’ve never even seen it? It seems absurd to say that I would be, but it could still be argued to be the case.
I love diagrams like this, which show license compatibility of different open source licenses. Adapted from a diagram by Carlo Daffara,
in turn adapted from a diagram by David E. Wheeler, used under a CC-BY-SA license.
99% of the time, though, the answer’s clear, and the ambiguities shown above shouldn’t stop anybody from choosing to open-source their work
under GPL, AGPL (or any other open source license depending on their
preference and their community). Perhaps the question of whether minification violates the letter of a copyleft license is one of those Potter Stewart “I know it when I see it” things. It certainly goes against the spirit of the thing to do so deliberately or
unnecessarily, though, and perhaps it’s that softer, more-altruistic goal we should be aiming for.