Minification vs the GPL

A not-entirely-theoretical question about open source software licensing came up at work the other day. I thought it was interesting enough to warrant a quick dive into the philosophy of minification, and how it relates to copyleft open source licenses. Specifically: does distributing (only) minified source code violate the GPL?

If you’ve come here looking for a legally-justifiable answer to that question, you’re out of luck. But what I can give you is a (fictional) story:

TheseusJS is slow

TheseusJS is a (fictional) Javascript library designed to be run in a browser. It’s released under the GPLv3 license. This license allows you to download and use TheseusJS for any purpose you like, including making money off it, modifying it, or redistributing it to others… but it requires that if you redistribute it you have to do so under the same license and include the source code. As such, it forces you to share with others the same freedoms you enjoy for yourself, which is highly representative of some schools of open-source thinking.

Screenshot showing TheseusJS's GitHub page. The project hasn't been updated in a year, and that was just to add a license: no code has been changed in 12 years.
It’s a cool project, but it really needs some maintenance this side of 2010.

It’s a great library and it’s used on many websites, but its performance isn’t great. It’s become infamous for the impact it has on the speed of the websites it’s used on, and it’s often the butt of jokes by developers: “Man, this website’s slow. Must be running Theseus!”

The original developer has moved onto his new project, Moralia, and seems uninterested in handling the growing number of requests for improvements. So I’ve decided to fork it and make my own version, FastTheseusJS and work on improving its speed.

FastTheseusJS is fast

I do some analysis and discover the single biggest problem with TheseusJS is that the Javascript file itself is enormous. The original developer kept all of the copious documentation in comments in the file itself, and for some reason it doesn’t even compress well. When you use TheseusJS on a website it takes a painfully long time for a browser to download it, if it’s not precached.

Screenshot showing a website for the TheseusJS API. It's pretty labyrinthine (groan).
Nobody even uses the documentation in the comments: there’s a website with a fully-documented API.

My first release of FastTheseusJS, then, removes virtually of the comments, replacing them with a single comment at the top pointing developers to a website where the API is fully documented. While I’m in there anyway, I also fix a minor bug that’s been annoying me for a while.

v1.1.0 changes

  • Forked from TheseusJS v1.0.4
  • Fixed issue #1071 (running mazeSolver() without first connecting <String> component results in endless loop)
  • Removed all comments: improves performance considerably

I discover another interesting fact: the developer of TheseusJS used a really random mixture of tabs and spaces for indentation, sometimes in the same line! It looks… okay if you set your editor up just right, but it’s pretty hideous otherwise. That whitespace is unnecessary anyway: the codebase is sprawling but it seldom goes more than two levels deep, so indentation levels don’t add much readability. For my second release of FastTheseusJS, then, I remove this extraneous whitespace, as well as removing the in-line whitespace inside parameter lists and the components of for loops. Every little helps, right?

v1.1.1 changes

  • Standardised whitespace usage
  • Removed unnecessary whitespace

Some of the simpler functions now fit onto just a single line, and it doesn’t even inconvenience me to see them this way: I know the codebase well enough by now that it’s no disadvantage for me to edit it in this condensed format.

Screenshot of a block of Javascript code intended using semicolons rather than tabs or spaces.
Personally, I’ve given up on the tabs-vs-spaces debate and now I indent my code using semicolons. (That’s clearly a joke. Don’t flame me.)

In the next version, I shorten the names of variables and functions in the code.

For some reason, the original developer used epic rambling strings for function names, like the well-known function dedicateIslandTempleToTheImageOfAGodBeforeOrAfterMakingASacrificeWithOrWithoutDancing( boolBeforeMakingASacrifice, objectImageOfGodToDedicateIslandTempleTo, stringNmeOfPersonMakingDedication, stringOrNullNameOfLocalIslanderDancedWith). That one gets called all the time internally and isn’t exposed via the external API so it might as well be shortened to d=(i,j,k,l,m)=>. Now all the internal workings of the library are each represented with just one or two letters.

v1.1.2 changes

  • Shortened/standarised non-API variable and function names – improves performance

I’ve shaved several kilobytes off the monstrous size of TheseusJS and I’m very proud. The original developer says nice things about my fork on social media, resulting in a torrent of downloads and attention. Within a certain archipelago of developers, I’m slightly famous.

But did I violate the license?

But then a developer says to me: you’re violating the license of the original project because you’re not making the source code available!

A man in a suit sits outdoors with a laptop and a cup of coffee. He is angry and frustrated, and a bubble shows that he is thinking:"why can't people respect the fucking license?!"
This happens every day. Probably not to this same guy every time though, but you never know. Original photo by Andrea Piacquadio.

They claim that my bugfix in the first version of FastTheseusJS represents a material change to the software, and that the changes I’ve made since then are obfuscation: efforts short of binary compilation that aim to reduce the accessibility of the source code. This fails to meet the GPL‘s definition of source code as “the preferred form of the work for making modifications to it”. I counter that this condensed view of the source code is my “preferred” way of working with it, and moreover that my output is not the result of some build step that makes the code harder to read, the code is just hard to read as a result of the optimisations I’ve made. In ambiguous cases, whose “preference” wins?

Did I violate the license? My gut feeling is that no, all of my changes were within the spirit and the letter of the GPL (they’re a terrible way to write code, but that’s not what’s in question here). Because I manually condensed the code, did so with the intention that this condensing was a feature, and continue to work directly with the code after condensing it because I prefer it that way… that feels like it’s “okay”.

But if I’d just run the code through a minification tool, my opinion changes. Suppose I’d run minify --output fasttheseus.js theseus.js and then deleted my copy of theseus.js. Then, making changes to fasttheseus.js and redistributing it feels like a violation to me… even if the resulting code is the same as I’d have gotten via the “manual” method!

I don’t know the answer (IANAL), but I’ll tell you this: I feel hypocritical for saying one piece of code would not violate the license but another identical piece of code would, based only on the process the developer followed to produce it. If I replace one piece of code at a time with less-readable versions the license remains intact, but if I replace them all at once it doesn’t? That doesn’t feel concrete nor satisfying.

Screenshot showing highly-minified HTML code (for this page) which is still reasonably readable.
Sure, I can write a blog post in just one line of code. It’ll just be a really, really, really long line… (Still perfectly readable, though!)

This isn’t an entirely contrived example

This example might seem highly contrived, and that’s because it is. But the grey area between the extremes is where the real questions are. If you agree that redistribution of (only) minified source code violates the GPL, you’re left asking: at what point does the change occur? Code isn’t necessarily minified or not-minified: there are many intermediate steps.

If I use a correcting linter to standardise indentation and whitespace – switching multiple spaces for the appropriate number of tabs, removing excess line breaks etc. (or do the same tasks manually) I’m sure you’d agree that’s fine. If I have it replace whole-function if-blocks with hoisted return statements, that’s probably fine too. If I replace if blocks with ternery operators or remove or shorten comments… that might be fine, but probably depends upon context. At some point though, some way along the process, minification goes “too far” and feels like it’s no longer within the limitations of the license. And I can’t tell you where that point is!

This issue’s even more-complicated with some other licenses, e.g. the AGPL, which extends the requirement to share source code to hosted applications. Suppose I implement a web application that uses an AGPL-licensed library. The person who redistributed it to me only gave me the minified version, but they gave me a web address from which to acquire the full source code, so they’re in the clear. I need to make a small patch to the library to support my service, so I edit it right into the minified version I’ve already got. A user of my hosted application asks for a copy of the source code, so I provide it, including the edited minified library… am I violating the license for not providing the full, unminified version, even though I’ve never even seen it? It seems absurd to say that I would be, but it could still be argued to be the case.

Diagram showing how permissive software licenses are generally compatible for use in LGPL or MPL licensed software, which are then compatible for use (except MPL) in GPL licensed software, which are in turn compatible for use (except GPL 2) with AGPL licensed software.
I love diagrams like this, which show license compatibility of different open source licenses. Adapted from a diagram by Carlo Daffara, in turn adapted from a diagram by David E. Wheeler, used under a CC-BY-SA license.

99% of the time, though, the answer’s clear, and the ambiguities shown above shouldn’t stop anybody from choosing to open-source their work under GPL, AGPL (or any other open source license depending on their preference and their community). Perhaps the question of whether minification violates the letter of a copyleft license is one of those Potter Stewart “I know it when I see it” things. It certainly goes against the spirit of the thing to do so deliberately or unnecessarily, though, and perhaps it’s that softer, more-altruistic goal we should be aiming for.

From Synergy to Barrier

I’ve been using Synergy for a long, long time. By the time I wrote about my admiration of its notification icon back in 2010 I’d already been using it for some years. But this long love affair ended this week when I made the switch to its competitor, Barrier.

Screenshot showing some pre-1.3 version of Synergy running on Windows Vista.
I’m not certain exactly when I took this screenshot (which I shared with Kit while praising Synergy), but it’s clearly a pre-1.4 version and those look distinctly like Windows Vista’s ugly rounded corners, so I’m thinking no later than 2009?

If you’ve not come across it before: Synergy was possibly the first multiplatform tool to provide seamless “edge-to-edge” sharing of a keyboard and mouse between multiple computers. Right now, for example, I’m sitting in front of Cornet, a Debian 11 desktop, Idiophone, a Macbook Pro docked to a desktop monitor, and Renegade, a Windows desktop. And I can move my mouse cursor from one, to the other, to the next, interacting with them all as if I were connected directly to it.

There have long been similar technologies. KVM switches can do this, as can some modern wireless mice (I own at least two such mice!). But none of them are as seamless as what Synergy does: moving from computer to computer as fast as you can move your mouse and sharing a clipboard between multiple devices. I also love that I can configure my set-up around how I work, e.g. when I undock my Macbook it switches from ethernet to wifi, this gets detected and it’s automatically removed from the cluster. So when I pick up my laptop, it magically stops being controlled by my Windows PC’s mouse and keyboard until I dock it again.

Illustration showing a Debian desktop called Cornet, a Mac laptop with attached monitor called Idiophone, and a Windows desktop called Renegade. All three share a single keyboard and mouse using Barrier.

Synergy’s published under a hybrid model: open-source components, with paid-for extra features. It used to provide more in the open-source offering: you could download a fully-working copy of the software and use it without limitation, losing out only on a handful of features that for many users were unnecessary. Nontheless, early on I wanted to support the development of this tool that I used so much, and so I donated money towards funding its development. In exchange, I gained access to Synergy Premium, and then when their business model changed I got grandfathered-in to a lifetime subscription to Synergy Pro.

I continued using Synergy all the while. When their problem-stricken 2.x branch went into beta, I was among the testers: despite the stability issues and limitations, I loved the fact that I could have what was functionally multiple co-equal “host” computers, and – when it worked – I liked the slick new configuration interface it sported. I’ve been following with bated breath announcements about the next generation – Synergy 3 – and I’ve registered as an alpha tester for when the time comes.

If it sounds like I’m a fanboy… that’d probably be an accurate assessment of the situation. So why, after all these years, have I jumped ship?

Email from Symless to Dan, reading: "Thank you for contacting Synergy Support. My name is Kim and I am happy to assist you. We do not have a download option for the 32 bit version of Debian 10. We currently only have the options available in the members area. Feel free to reach out if you have any further questions or concerns."
Dear Future Dan. If you ever need a practical example of where open-source thinking provides a better user experience than arbritrarily closed-source products, please see above. Yours, Past Dan.

I’ve been aware of Barrier since the project started, as a fork of the last open-source version of the core Synergy program. Initially, I didn’t consider Barrier to be a suitable alternative for me, because it lacked features I cared about that were only available in the premium version of Synergy. As time went on and these features were implemented, I continued to stick with Synergy and didn’t bother to try out Barrier… mostly out of inertia: Synergy worked fine, and the only thing Barrier seemed to offer would be a simpler set-up (because I wouldn’t need to insert my registration details!).

This week, though, as part of a side project, I needed to add an extra computer to my cluster. For reasons that are boring and irrelevant and so I’ll spare you the details, the new computer’s running the 32-bit version of Debian 11.

I went to the Symless download pages and discovered… there isn’t a Debian 11 package. Ah well, I think: the Debian 10 one can probably be made to work. But then I discover… there’s only a 64-bit version of the Debian 10 binary. I’ll note that this isn’t a fundamental limitation – there are 32-bit versions of Synergy available for Windows and for ARMhf Raspberry Pi devices – but a decision by the developers not to support that platform. In order to protect their business model, Synergy is only available as closed-source binaries, and that means that it’s only available for the platforms for which the developers choose to make it available.

So I thought: well, I’ll try Barrier then. Now’s as good a time as any.

Screenshot showing Mac computer "Idiophone" being configured in Barrier to connect to server "Renegade".
Setting up Barrier in place of Synergy was pretty familiar and painless.

Barrier and Synergy aren’t cross-compatible, so first I had to disable Synergy on each machine in my cluster. Then I installed Barrier. Like most popular open-source software, this was trivially easy compared to Synergy: I just used an appropriate package manager by running choco install barrier, brew install barrier, and apt install barrier to install on each of the Windows, Mac, and Debian computers, respectively.

Configuring Barrier was basically identical to configuring Synergy: set up the machine names, nominate one the server, and tell the server what the relative positions are of each of the others’ screens. I usually bind the “scroll lock” key to the “lock my cursor to the current screen” function but I wasn’t permitted to do this in Barrier for some reason, so I remapped my scroll lock key to some random high unicode character and bound that instead.

Getting Barrier to auto-run on MacOS was a little bit of a drag – in the end I had to use Automator to set up a shortcut that ran it and loaded the configuration, and set that to run on login. These little touches are mostly solved in Synergy, but given its technical audience I don’t imagine that anybody is hugely inconvenienced by them. Nonetheless, Synergy clearly retains a slightly more-polished experience.

Altogether, switching from Synergy to Barrier took me under 15 minutes and has so far offered me a functionally-identical experience, except that it works on more devices, can be installed via my favourite package managers, and doesn’t ask me for registration details before it functions. Synergy 3’s going to have to be a big leap forward to beat that!

Heatmapping my Movements

As I mentioned last year, for several years I’ve collected pretty complete historic location data from GPSr devices I carry with me everywhere, which I collate in a personal μlogger server.

Going back further, I’ve got somewhat-spotty data going back a decade, thanks mostly to the fact that I didn’t get around to opting-out of Google’s location tracking until only a few years ago (this data is now also housed in μlogger). More-recently, I now also get tracklogs from my smartwatch, so I’m managing to collate more personal location data than ever before.

Inspired perhaps at least a little by Aaron Parecki, I thought I’d try to do something cool with it.

Heatmapping my movements

The last year

Heatmap showing Dan's movements around Oxford since moving house in 2020. There's a strong cluster around Stanton Harcourt with heavy tendrils around Witney and Eynsham and along the A40 to Summertown, and lighter tendrils around North and Central Oxford.
My movements over the last year have been relatively local, but there are some interesting hotspots and common routes.

What you’re looking at is a heatmap showing my location over the last year or so since I moved to The Green. Between the pandemic and switching a few months prior to a job that I do almost-entirely at home there’s not a lot of travel showing, but there’s some. Points of interest include:

  • The blob around my house, plus some of the most common routes I take to e.g. walk or cycle the children to school.
  • A handful of my favourite local walking and cycling routes, some of which stand out very well: e.g. the “loop” just below the big blob represents a walk around the lake at Dix Pit; the blob on its right is the Devils Quoits, a stone circle and henge that I thought were sufficiently interesting that I made a virtual geocache out of them.
  • The most common highways I spend time on: two roads into Witney, the road into and around Eynsham, and routes to places in Woodstock and North Oxford where the kids have often had classes/activities.
  • I’ve unsurprisingly spent very little time in Oxford City Centre, but when I have it’s most often been at the Westgate Shopping Centre, on the roof of which is one of the kids’ favourite restaurants (and which we’ve been able to go to again as Covid restrictions have lifted, not least thanks to their outdoor seating!).

One to eight years ago

Let’s go back to the 7 years prior, when I lived in Kidlington. This paints a different picture:

Heatmap showing Dan's movements around Kidlington, including a lot of time in the village and in Oxford City Centre, as well as hotspots at the hospital, parks, swimming pools, and places that Dan used to volunteer. Individual expeditions can also be identified.
For the seven years I lived in Kidlington I moved around a lot more than I have since: each hotspot tells a story, and some tell a few.

This heatmap highlights some of the ways in which my life was quite different. For example:

  • Most of my time was spent in my village, but it was a lot larger than the hamlet I live in now and this shows in the size of my local “blob”. It’s also possible to pick out common destinations like the kids’ nursery and (later) school, the parks, and the routes to e.g. ballet classes, music classes, and other kid-focussed hotspots.
  • I worked at the Bodleian from early 2011 until late in 2019, and so I spent a lot of time in Oxford City Centre and cycling up and down the roads connecting my home to my workplace: Banbury Road glows the brightest, but I spent some time on Woodstock Road too.
  • For some of this period I still volunteered with Samaritans in Oxford, and their branch – among other volunteering hotspots – show up among my movements. Even without zooming in it’s also possible to make out individual venues I visited: pubs, a cinema, woodland and riverside walks, swimming pools etc.
  • Less-happily, it’s also obvious from the map that I spent a significant amount of time at the John Radcliffe Hospital, an unpleasant reminder of some challenging times from that chapter of our lives.
  • The data’s visibly “spottier” here, mostly because I built the heatmap only out of the spatial data over the time period, and not over the full tracklogs (i.e. the map it doesn’t concern itself with the movement between two sampled points, even where that movement is very-guessable), and some of the data comes from less-frequently-sampled sources like Google.

Eight to ten years ago

Let’s go back further:

Heatmap showing Dan's movements around Oxford during the period he lived in Kennington. Again, it's dominated by time at home, in the city centre, and commuting between the two.
Back when I lived in Kennington I moved around a lot less than I would come to later on (although again, the spottiness of the data makes that look more-significant than it is).

Before 2011, and before we bought our first house, I spent a couple of years living in Kennington, to the South of Oxford. Looking at this heatmap, you’ll see:

  • I travelled a lot less. At the time, I didn’t have easy access to a car and – not having started my counselling qualification yet – I didn’t even rent one to drive around very often. You can see my commute up the cyclepath through Hinksey into the City Centre, and you can even make out the outline of Oxford’s Covered Market (where I’d often take my lunch) and a building in Osney Mead where I’d often deliver training courses.
  • Sometimes I’d commute along Abingdon Road, for a change; it’s a thinner line.
  • My volunteering at Samaritans stands out more-clearly, as do specific venues inside Oxford: bars, theatres, and cinemas – it’s the kind of heatmap that screams “this person doesn’t have kids; they can do whatever they like!”

Every map tells a story

I really love maps, and I love the fact that these heatmaps are capable of painting a picture of me and what my life was like in each of these three distinct chapters of my life over the last decade. I also really love that I’m able to collect and use all of the personal data that makes this possible, because it’s also proven useful in answering questions like “How many times did I visit Preston in 2012?”, “Where was this photo taken?”, or “What was the name of that place we had lunch when we got lost during our holiday in Devon?”.

There’s so much value in personal geodata (that’s why unscrupulous companies will try so hard to steal it from you!), but sometimes all you want to do is use it to draw pretty heatmaps. And that’s cool, too.

Heatmap showing Dan's movements around Great Britain for the last 10 years: with a focus on Oxford, tendrils stretch to hotspots in South Wales, London, Cambridge, York, Birmingham, Preston, Glasgow, Edinburgh, and beyond.

How these maps were generated

I have a μlogger instance with the relevant positional data in. I’ve automated my process, but the essence of it if you’d like to try it yourself is as follows:

First, write some SQL to extract all of the position data you need. I round off the latitude and longitude to 5 decimal places to help “cluster” dots for frequency-summing, and I raise the frequency to the power of 3 to help make a clear gradient in my heatmap by making hotspots exponentially-brighter the more popular they are:

SELECT ROUND(latitude, 5) lat, ROUND(longitude, 5) lng, POWER(COUNT(*), 3) `count`
FROM positions
WHERE `time` BETWEEN '2020-06-22' AND '2021-08-22'
GROUP BY ROUND(latitude, 5), ROUND(longitude, 5)

This data needs converting to JSON. I was using Ruby’s mysql2 gem to fetch the data, so I only needed a .to_json call to do the conversion – like this:

db = Mysql2::Client.new(host: ENV['DB_HOST'], username: ENV['DB_USERNAME'], password: ENV['DB_PASSWORD'], database: ENV['DB_DATABASE'])
db.query(sql).to_a.to_json

Approximately following this guide and leveraging my Mapbox subscription for the base map, I then just needed to include leaflet.js, heatmap.js, and leaflet-heatmap.js before writing some JavaScript code like this:

body.innerHTML = '<div id="map"></div>';
let map = L.map('map').setView([51.76, -1.40], 10);
// add the base layer to the map
L.tileLayer('https://api.mapbox.com/styles/v1/{id}/tiles/{z}/{x}/{y}?access_token={accessToken}', {
  maxZoom: 18,
  id: 'itsdanq/ckslkmiid8q7j17ocziio7t46', // this is the style I defined for my map, using Mapbox
  tileSize: 512,
  zoomOffset: -1,
  accessToken: '...' // put your access token here if you need one!
}).addTo(map);
// fetch the heatmap JSON and render the heatmap
fetch('heat.json').then(r=>r.json()).then(json=>{
  let heatmapLayer = new HeatmapOverlay({
    "radius": parseFloat(document.querySelector('#radius').value),
    "scaleRadius": true,
    "useLocalExtrema": true,
  });
  heatmapLayer.setData({ data: json });
  heatmapLayer.addTo(map);
});

That’s basically all there is to it!

Spy’s Guidebook Reborn

When I was a kid of about 10, one of my favourite books was Usborne’s Spy’s Guidebook. (I also liked its sister the Detective’s Handbook, but the Spy’s Guidebook always seemed a smidge cooler to me).

Detective's Handbook andSpy's Guidebook on a child's bookshelf.
I imagine that a younger version of me would approve of our 7-year-old’s bookshelf, too.

So I was pleased when our eldest, now 7, took an interest in the book too. This morning, for example, she came to breakfast with an encrypted message for me (along with the relevant page in the book that contained the cipher I’d need to decode it).

Usborne Spy's Guidebook showing the "Pocket code card" page and a coded message
Decryption efforts were hampered by sender’s inability to get her letter “Z”s the right damn way around.

Later, as we used the experience to talk about some of the easier practical attacks against this simple substitution cipher (letter frequency analysis, and known-plaintext attacks… I haven’t gotten on to the issue of its miniscule keyspace yet!), she asked me to make a pocket version of the code card as described in the book.

Three printed pocket code cards
A three-bit key doesn’t make a simple substitution cipher significantly safer, but it does serve as a vehicle to teach elementary cryptanalysis!

While I was eating leftover curry for lunch with one hand and producing a nice printable, foldable pocket card for her (which you can download here if you like) with the other, I realised something. There are likely to be a lot more messages in my future that are protected by this substitution cipher, so I might as well preempt them by implementing a computerised encoder/decoder right away.

So naturally, I did. It’s at danq.dev/spy-pocket-code and all the source code is available to do with as you please.

Key 4-1 being used to decode the message: UOMF0 7PU9V MMFKG EH8GE 59MLL GFG00 8A90P 5EMFL
Uh-oh: my cover is blown!

If you’ve got kids of the right kind of age, I highly recommend picking up a copy of the Spy’s Guidebook (and possibly the Detective’s Handbook). Either use it as a vehicle to talk about codes and maths, like I have… or let them believe it’s secure while you know you can break it, like we did with Enigma machines after WWII. Either way, they eventually learn a valuable lesson about cryptography.

Wix and Their Dirty Tricks

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Wix, the website builder company you may remember from stealing WordPress code and lying about it, has now decided the best way to gain relevance is attacking the open source WordPress community in a bizarre set of ads. They can’t even come up with original concepts for attack ads, and have tried to rip-off of Apple’s Mac vs PC ads, but tastelessly personify the WordPress community as an absent, drunken father in a therapy session. 🤔

I have a lot of empathy for whoever was forced to work on these ads, including the actors, it must have felt bad working on something that’s like Encyclopedia Britannica attacking Wikipedia. WordPress is a global movement of hundreds of thousands of volunteers and community members, coming together to make the web a better place. The code, and everything you put into it, belongs to you, and its open source license ensures that you’re in complete control, now and forever. WordPress is free, and also gives you freedom.

For those that haven’t been following the relevant bits of tech social media this last week, here’s the insanity you’ve missed:

  1. Wix start their new marketing campaign by posting headphones and a secret video link to people they clearly think are WordPress “influencers”. But the video is so confusing that people thought it was a WordPress marketing campaign against Wix, not the other way around.
  2. Next, Wix launch their “You Deserve Better” website, attempting to riff off the old “Mac vs. PC” ads. It’s been perhaps most-charitably described as a “bewildering” attack ad, more-critically described as being insensitive and distasteful.
  3. Wix’s Twitter and YouTube responses suddenly swing from their usual “why is your customer service so slow to respond to me?” level of negative to outright hostile. LOL.

Sure, I’m not the target audience. I’ve been a WordPress user for 15 years, and every time I visit a Wix site it annoys me when I have to permit a stack of third-party JavaScript just to load images like they’ve never heard of the <img>tag or something. Hell, I like WordPress enough that I used it as a vehicle to get a job with Automattic, a company most-famous for its WordPress hosting provision. But even putting all of that aside: this advertising campaign stinks.

Review for ProtonMail Encryption Status by Morgan Larosa

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

⭐⭐⭐⭐⭐
Does what it says on the tin! Short and sweet codebase that’s easy enough to verify personally, and doesn’t ask for any crazy permissions.

I probably needn’t care about this validation: when I wrote a Thunderbird plugin to enhance integration with ProtonMail, I wrote it principally for myself: scratching my own itch. It was nice to see that (at time of writing) a few hundred other people have made use of the extension too, but it wasn’t essential. I’d be maintaining it regardless because I use it every day.

But it still warmed my heart to see a five-star review come in alongside a clearly-expressed justification.

Standing up for developers: youtube-dl is back

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Today we reinstated youtube-dl, a popular project on GitHub, after we received additional information about the project that enabled us to reverse a Digital Millennium Copyright Act (DMCA) takedown.

This is a Big Deal. For two reasons:

Firstly, youtube-dl is a spectacularly useful project. I’ve used it for many years to help me archive my own content, to improve my access to content that’s freely available on the platform, and to help centralise (freely available) metadata to keep my subscriptions on video-sharing sites. Others have even more-important uses for the tool. I love youtube-dl, and I’d never considered the possibility that it could be used to circumvent digital restrictions (apparently it’s got some kind of geofence-evading features you can optionally enable, for people who don’t have a multi-endpoint VPN I guess?… I note that it definitely doesn’t break DRM…) until its GitHub repo got taken down the other week.

Which was a bleeding stupid thing to use a DMCA request on, because, y’know: Barbara Streisand Effect. Lampshading that a free, open-source tool could be used for people’s convenience is likely to increase awareness and adoption, not decrease it! Huge thanks to the EFF for stepping up and telling GitHub that they’d got it wrong (this letter is great reading, by the way).

But secondly, GitHub’s response is admirable and – assuming their honour their new stance – effective. They acknowledge their mistake, then go on to set out a new process by which they’ll review takedown requests. That new process includes technical and legal review, erring on the side of the developer rather than the claimant (i.e. “innocent until proven guilty”), multiparty negotiation, and limiting the scope of takedowns by allowing violators to export their non-infringing content after the fact.

I was concerned that the youtube-dl takedown might create a FOSS “chilling effect” on GitHub. It still might: in the light of it, I for one have started backing up my repositories and those of projects I care about to an different Git server! But with this response, I’d still be confident hosting the main copy of an open-source project on GitHub, even if that project was one which was at risk of being mistaken for copyright violation.

Note that the original claim came not from Google/YouTube as you might have expected (if you’ve just tuned in) but from the RIAA, based on the fact that youtube-dl could be used to download copyrighted music videos for enjoyment offline. If you’re reminded of Sony v. Universal City Studios (1984) – the case behind the “Betamax standard” – you’re not alone.

Displaying ProtonMail Encryption Status in Thunderbird

In a hurry? Get the Thunderbird plugin here.

I scratched an itch of mine this week and wanted to share the results with you, in case you happen to be one of the few dozen other people on Earth who will cry “finally!” to discover that this is now a thing.

Encrypted email identified in Thunderbird having gone through ProtonMail Bridge
In the top right corner of this email, you can see that it was sent with end-to-end encryption from another ProtonMail user.

I’ve used ProtonMail as my primary personal email provider for about four years, and I love it. Seamless PGP/GPG for proper end-to-end encryption, privacy as standard, etc. At first, I used their web and mobile app interfaces but over time I’ve come to rediscover my love affair with “proper” email clients, and I’ve been mostly using Thunderbird for my desktop mail. It’s been great: lightning-fast search, offline capabilities, and thanks to IMAP (provided by ProtonMail Bridge) my mail’s still just as accessible when I fall-back on the web or mobile clients because I’m out and about.

But the one thing this set-up lacked was the ability to easily see which emails had been delivered encrypted versus those which had merely been delivered “in the clear” (like most emails) and then encrypted for storage on ProtonMail’s servers. So I fixed it.

Four types of email: E2E encrypted internal mail from other ProtonMail users, PGP-encrypted email from non ProtonMail users, encrypted mail stored encrypted by ProtonMail, and completely unencrypted mail such as stored locally in your Sent or Drafts folder
There are fundamentally four states a Thunderbird+ProtonMail Bridge email can be in, and here’s how I represent them.

I’ve just released my first ever Thunderbird plugin. If you’re using ProtonMail Bridge, it adds a notification to the corner of every email to say whether it was encrypted in transit or not. That’s all.

And of course it’s open source with a permissive license (and a doddle to compile using your standard operating system tools, if you want to build it yourself). If you’re using Thunderbird and ProtonMail Bridge you should give it a whirl. And if you’re not then… maybe you should consider it?

Syncthing

This last month or so, my digital life has been dramatically improved by Syncthing. So much so that I want to tell you about it.

Syncthing interface via Synctrayzor on Windows, showing Dan's syncs.
1.25TiB of data is automatically kept in sync between (depending on the data in question) a desktop PC, NAS, media centre, and phone. This computer’s using the Synctrayzor system tray app.

I started using it last month. Basically, what it does is keeps a pair of directories on remote systems “in sync” with one another. So far, it’s like your favourite cloud storage service, albeit self-hosted and much-more customisable. But it’s got a handful of killer features that make it nothing short of a dream to work with:

  • The unique identifier for a computer can be derived from its public key. Encryption comes free as part of the verification of a computer’s identity.
  • You can share any number of folders with any number of other computers, point-to-point or via an intermediate proxy, and it “just works”.
  • It’s super transparent: you can always see what it’s up to, you can tweak the configuration to match your priorities, and it’s open source so you can look at the engine if you like.

Here are some of the ways I’m using it:

Keeping my phone camera synced to my PC

Phone syncing with PC

I’ve tried a lot of different solutions for this over the years. Back in the way-back-when, like everybody else in those dark times, I used to plug my phone in using a cable to copy pictures off and sort them. Since then, I’ve tried cloud solutions from Google, Amazon, and Flickr and never found any that really “worked” for me. Their web interfaces and apps tend to be equally terrible for organising or downloading files, and I’m rarely able to simply drag-and-drop images from them into a blog post like I can from Explorer/Finder/etc.

At first, I set this up as a one-way sync, “pushing” photos and videos from my phone to my desktop PC whenever I was on an unmetered WiFi network. But then I switched it to a two-way sync, enabling me to more-easily tidy up my phone of old photos too, by just dragging them from the folder that’s synced with my phone to my regular picture storage.

Centralising my backups

Phone and desktop backups centralised through the NAS

Now I’ve got a fancy NAS device with tonnes of storage, it makes sense to use it as a central point for backups to run fom. Instead of having many separate backup processes running on different computers, I can just have each of them sync to the NAS, and the NAS can back everything up. Computers don’t need to be “on” at a particular time because the NAS runs all the time, so backups can use the Internet connection when it’s quietest. And in the event of a hardware failure, there’s an up-to-date on-site backup in the first instance: the cloud backup’s only needed in the event of accidental data deletion (which could be sync’ed already, of course!). Plus, integrating the sync with ownCloud running on the NAS gives easy access to my files wherever in the world I am without having to fire up a VPN or otherwise remote-in to my house.

Plus: because Syncthing can share a folder between any number of devices, the same sharing mechanism that puts my phone’s photos onto my main desktop can simultaneously be pushing them to the NAS, providing redundant connections. And it was a doddle to set up.

Maintaining my media centre’s screensaver

PC photos syncing to the media centre.

Since the NAS, running Jellyfin, took on most of the media management jobs previously shared between desktop computers and the media centre computer, the household media centre’s had less to do. But one thing that it does, and that gets neglected, is showing a screensaver of family photos (when it’s not being used for anything else). Historically, we’ve maintained the photos in that collection via a shared network folder, but then you’ve got credential management and firewall issues to deal with, not to mention different file naming conventions by different people (and their devices).

But simply sharing the screensaver’s photo folder with the computer of anybody who wants to contribute photos means that it’s as easy as copying the picture to a particular place. It works on whatever device they care to (computer, tablet, mobile) on any operating system, and it’s quick and seamless. I’m just using it myself, for now, but I’ll be offering it to the rest of the family soon. It’s a trivial use-case, but once you’ve got it installed it just makes sense.

In short: this month, I’m in love with Syncthing. And maybe you should be, too.

Identifying Post Kinds in WordPress RSS Feeds

I use the Post Kinds plugin to streamline the management of the different types of posts I make on my blog, based on the IndieWeb post types list: articles, like this one, are “conventional” blog posts, but I also publish notes (which are analogous to “tweets”), reposts (“shares” of things I’ve found online, sometimes with commentary), checkins (mostly chronicling my geocaching/geohashing), and others: I’ve extended Post Kinds to facilitate comics and reviews, for example.

But for people who subscribe (either directly or indirectly) to everything I post, I imagine it must be a little frustrating to sometimes be unable to identify the type of a post before clicking-through. So I’ve added the following code, which I’m sharing here and on GitHub in case it’s of any use to anybody else, to my theme’s functions.php:

// Make titles in RSS feed be prefixed by the Kind of the post.
function add_kind_to_rss_post_title(){
	$kinds = wp_get_post_terms( get_the_ID(), 'kind' );
	if( ! isset( $kinds ) || empty( $kinds ) ) return get_the_title(); // sanity-check.
	$kind = $kinds[0]->name;
	$title = get_the_title();
	return trim( "[{$kind}] {$title}" );
}
add_filter( 'the_title_rss', 'add_kind_to_rss_post_title', 4 ); // priority 4 to ensure it happens BEFORE default escaping filters.

This decorates the titles of my posts, but only in my feeds, so it’s easier for people to tell at-a-glance what’s going on:

Rendered RSS feed showing Post Kinds prefixes

Down the line I might expand this so that it doesn’t show if the subscriber is, for example, asking only for articles (e.g. via this feed); I’m coming up with a huge list of things I’d like to do at IndieWebCamp London! But for now, this feels like a nice simple improvement to a plugin I love that helps it to fit my specific needs.

City Roads

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Map of Kidlington's roads

Cute open source project that produces on-demand SVG and PNG maps, like the one above, based on the roads in OpenStreetMap data. It takes a somewhat liberal view of what a “road” is: I found it momentarily challenging to get my bearings in the map above, which includes where I live, because the towpath and cycle paths are included which I hadn’t expected. Still a beautiful bit of output and the source code could be adapted for any number of interesting cartographic projects.

Cheating Hangman

A long while ago, inspired by Nick Berry‘s analysis of optimal Hangman strategy, I worked it backwards to find the hardest words to guess when playing Hangman. This week, I showed these to my colleague Grace – who turns out to be a fan of word puzzles – and our conversation inspired me to go a little deeper. Is it possible, I thought, for me to make a Hangman game that cheats by changing the word it’s thinking of based on the guesses you make in order to make it as difficult as possible for you to win?

Play “Cheating Hangman”

The principle is this: every time the player picks a letter, but before declaring whether or not it’s found in the word –

  1. Make a list of all possible words that would fit into the boxes from the current game state.
  2. If there are lots of them, still, that’s fine: let the player’s guess go ahead.
  3. But if the player’s managing to narrow down the possibilities, attempt to change the word that they’re trying to guess! The new word must be:
    • Legitimate: it must still be the same length, have correctly-guessed letters in the same places, and contain no letters that have been declared to be incorrect guesses.
    • Harder: after resolving the player’s current guess, the number of possible words must be larger than the number of possible words that would have resulted otherwise.
Gallows on a hill.
Yeah, you’re screwed now.

You might think that this strategy would just involve changing the target word so that you can say “nope” to the player’s current guess. That happens a lot, but it’s not always the case: sometimes, it’ll mean changing to a different word in which the guessed letter also appears. Occasionally, it can even involve changing from a word in which the guessed letter didn’t appear to one in which it does: that is, giving the player a “freebie”. This may seem counterintuitive as a strategy, but it sometimes makes sense: if saying “yeah, there’s an E at the end” increases the number of possible words that it might be compared to saying “no, there are no Es” then this is the right move for a cheating hangman.

Playing against a cheating hangman also lends itself to devising new strategies as a player, too, although I haven’t yet looked deeply into this. But logically, it seems that the optimal strategy against a cheating hangman might involve making guesses that force the hangman to bisect the search space: knowing that they’re always going to adapt towards the largest set of candidate words, a perfect player might be able to make guesses to narrow down the possibilities as fast as possible, early on, only making guesses that they actually expect to be in the word later (before their guess limit runs out!).

Cheating Hangman
The game is brutally-difficult, but surprisingly fun, and you can have it tell you when and how it cheats so you can begin to understand its strategy.

I also find myself wondering how easily I could adapt this into a “helpful hangman”: a game which would always change the word that you’re trying to guess in order to try to make you win. This raises the possibility of a whole new game, “suicide hangman”, in which the player is trying to get themselves killed and so is trying to pick letters that can’t possibly be in the word and the hangman is trying to pick words in which those letters can be found, except where doing so makes it obvious which letters the player must avoid next. Maybe another day.

In the meantime, you’re welcome to go play the game (and let me know what you think, below!) and, if you’re of such an inclination, read the source code. I’ve used some seriously ugly techniques to make this work, including regular expression metaprogramming (using regular expressions to write regular expressions), but the code should broadly make sense if you want to adapt it. Have fun!

Play “Cheating Hangman”

Update 26 September 2019, 16:23: I’ve now added “helpful mode”, where the computer tries to cheat on your behalf rather than against you, but it’s not as helpful as you’d think because it assumes you’re playing optimally and have already memorised the dictionary!

Update 1 October 2019, 06:40: Now featured on MetaFilter; hi, MeFites!

Counting Down

I wasn’t sure that my whiteboard at the Bodleian, which reminds my co-workers exactly how many days I’ve got left in the office, was attracting as much attention as it needed to. If I don’t know what my colleagues don’t know about how I do my job, I can’t write it into my handover notes.

You have [20] work days left to ask Dan that awkward question.
Tick, tick, tick, tick, boom.
So I repurposed a bit of digital signage in the office with a bit of Javascript to produce a live countdown. There’s a lot of code out there to produce countdown timers, but mine had some very specific requirements that nothing else seems to “just do”. Mine needed to:

  • Only count down during days that I’m expected to be in the office.
  • Only count down during working hours.
  • Carry on seamlessly after a reboot.

Screen showing: "Dan will be gone in 153 hours, 54 minutes, 38 seconds."
[insert Countdown theme song here]
Naturally, I’ve open-sourced it in case anybody else needs one, ever. It’s pretty basic, of course, because I’ve only got a hundred and fifty-something hours to finish a lot of things so I only wanted to throw a half hour at this while I ate my lunch! But if you want one, just put in an array of your working dates, the time you start each day, and the number of hours in your workday, and it’ll tick away.

LABS Comic RSS Archive

Yesterday I recommended that you go read Aaron Uglum‘s webcomic LABS which had just completed its final strip. I’m a big fan of “completed” webcomics – they feel binge-able in the same way as a complete Netflix series does! – but Spencer quickly pointed out that it’s annoying for we enlightened modern RSS users who hook RSS up to everything to have to binge completed comics in a different way to reading ongoing ones: what he wanted was an RSS feed covering the entire history of LABS.

LABS comic adapted to show The Robot literally "feeding" RSS
With apologies to Aaron Uglum who I hope won’t mind me adapting his comic in this way.

So naturally (after the intense heatwave woke me early this morning anyway) I made one: complete RSS feed of LABS. And, of course, I open-sourced the code I used to generate it so that others can jumpstart their projects to make static RSS feeds from completed webcomics, too.

Even if you’re not going to read it via this medium, you should go read LABS.