Note #21293

Max props to the developer of puppeteer-extra-plugin-stealth, who I just bought a coffee for.

The screen-scraper I wrote to bulk-export data from my Garmin sports tracker (because Garmin’s API is “only for corporate partners”, which is a magic spell you can say to make me write and open source a screen-scraper that targets your systems) stopped working today. Turns out Cloudflare could detect my automation.

Installed puppeteer-extra-plugin-stealth. Fixed instantly. Awesome.

How a 2002 standard made 2022 bearable

This is an alternate history of the Web. The premise is true, but the story diverges from our timeline and looks at an alternative “Web that might have been”.

Prehistory

This is the story of P3P, one of the greatest Web standards whose history has been forgotten1, and how the abject failure of its first versions paved the way for its bright future decades later. But I’m getting ahead of myself…

Drafted in 2002 in the wake of growing concern about the death of privacy on the Internet, P3P 1.0 aimed to make the collection of personally-identifiable data online transparent. Hurrah, right?

Not so much. Its immediate impact was lukewarm to negative: developers couldn’t understand why their cookies were no longer being accepted by Internet Explorer 6, the first browser to implement the standard, and the whole exercise was slated as providing a false sense of security, not stopping actual bad guys, and an attempt to apply a technical solution to a political problem.2

Flowchart showing the negotiation process between a user, browser, and server as the user browses an ecommerce site. The homepage's P3P policy states that it collects IP addresses, which is compatible with the user's preferences. Later, at checkout, the P3P policy states that the user's address will be collected and shared with a courier. The collection is fine according to the user's preferences, but she's asked to be notified if it'll be shared, so the browser notifies the user. The user approves of the policy and asks that this approval is remembered for this site, and the checkout process continues.
Initially, the principle was sound. The specification was weak. The implementation was apalling. But P3P 1.1 could have worked well.

Developers are lazy3 and soon converged on the simplest possible solution: add a garbage HTTP header like P3P: CP="See our website for our privacy policy." and your cookies work just fine! Ignore the problem, ignore the proposed solution, just do what gets the project shipped.

Without any meaningful enforcement it also perfectly feasible to, y’know, just lie about how well you treat user data. Seeing the way the wind was blowing, Mozilla dropped support for P3P, and Microsoft’s support – which had always been half-baked and lacked even the most basic user-facing controls or customisation options – languished in obscurity.

For a while, it seemed like P3P was dying. Maybe, in some alternate timeline, it did die: vanishing into nothing like VRML, WAP, and XBAP.

But fortunately for us, we don’t live in that timeline.

Revival

In 2009, the European Union revisited the Privacy and Electronic Communications Directive. The initial regulations, published in 2002, required that Web users be able to opt-out of tracking cookies, but the amendment required that sites ensure that users opted-in.

As-written, this confusing new regulation posed an immediate problem: if a user clicked the button to say “no, I don’t want cookies”, and you didn’t want to ask for their consent again on every page load… you had to give them a cookie (or use some other technique legally-indistinguishable from cookies). Now you’re stuck in an endless cookie-circle.4

This, and other factors of informed consent, quickly introduced a new pattern among those websites that were fastest to react to the legislative change:

Screenshot from how-i-experience-web-today.com showing an article mostly-covered by a cookie privacy statement and configuration options, utilising dark patterns to try to discourage users from opting-out of cookies.
The cookie consent banner, with all its confusing language and dark patterns, looked like it was going to become the new normal for web users in the early 2010s. But thankfully, our saviour had been waiting in the wings all along.

Web users rebelled. These ugly overlays felt like a regresssion to a time when popup ads and splash pages were commonplace. “If only,” people cried out, “There were a better way to do this!”

It was Professor Lorie Cranor, one of the original authors of the underloved P3P specification and a respected champion of usable privacy and security, whose rallying cry gave us hope. Her CNET article, “Why the EU Cookie Directive is a solved problem”5, inspired a new generation of development on what would become known as P3P 2.0.

While maintaining backwards compatibility, this new standard:

  • deprecated those horrible XML documents in favour of HTTP headers and <link> tags alone,
  • removing support for Set-Cookie2: headers, which nobody used anyway, and
  • added features by which the provenance and purpose of cookies could be stated in a way that dramatically simplified adoption in browsers

Internet Explorer at this point was still used by a majority of Web users. It still supported the older version of the standard, and – as perhaps the greatest gift that the much-maligned browser ever gave us – provided a reference implementation as well as a stepping-stone to wider adoption.

Opera, then Firefox, then “new kid” Chrome each adopted P3P 2.0; Microsoft finally got on board with IE 8 SP 1. Now the latest versions of all the mainstream browsers had a solid implementation6 well before the European data protection regulators began fining companies that misused tracking cookies.

Fabricated screenshot from Microsoft Edge, browsing 3r.org.uk: a "privacy" icon in the address bar has been clicked, and the resulting menu says: About 3r.org.uk. Connection is secure (with link for more info). Privacy and Cookies (with link for more info). Cookies (3 cookies in use) - Strictly necessary (2 in use), dropdown menu set to "Default (accept, delete later)"; Optional (1 in use), dropdown menu set to "Accept for this site". Checkbox for "Treat third-party cookies differently?", unchecked. Privacy (link to full policy): Legitimate interest - this site collects username, IP address, technical logs...; Consenmt - this site collects email address, phone number... Button to manage content. Button to "Exercise data rights".
Nowadays, we’ve pretty-well standardised on the address bar being the place where all cookie and privacy information and settings are stored. Can you imagine if things had gone any other way?

But where the story of P3P‘s successes shine brightest came in 2016, with the passing of the GDPR. The W3C realised that P3P could simplify both the expression and understanding of privacy policies for users, and formed a group to work on version 2.1. And that’s the version you use today.

When you launch a new service, you probably use one of the many free wizard-driven tools to express your privacy policy and the bases for your data processing, and it spits out a template privacy policy. You need the human-readable version, of course, since the 2020 German court ruling that you cannot rely on a machine-readable privacy policy alone, but the real gem is the P3P: 2.1 header version.

Assuming you don’t have any unusual quirks in your data processing (ask your lawyer!), you can just paste the relevant code into your server configuration and you’re good to go. Site users get a warning if their personal data preferences conflict with your data policies, and can choose how to act: not using your service, choosing which of your features to opt-in or out- of, or – hopefully! – granting an exception to your site (possibly with caveats, such as sandboxing your cookies or clearing them immediately after closing the browser tab).

Sure, what we’ve got isn’t perfect. Sometimes companies outright lie about their use of information or use illicit methods to track user behaviour. There’ll always be bad guys out there. That’s what laws are there to deal with.

But what we’ve got today is so seamless, it’s hard to imagine a world in which we somehow all… collectively decided that the correct solution to the privacy problem might have been to throw endless popovers into users’ faces, bury consent-based choices under dark patterns, and make humans do the work that should from the outset have been done by machines. What a strange and terrible timeline that would have been.

Footnotes

1 If you know P3P‘s history, regardless of what timeline you’re in: congratulations! You win One Internet Point.

2 Techbros have been trying to solve political problems using technology since long before the word “techbro” was used in its current context. See also: (a) there aren’t enough mental health professionals, let’s make an AI app? (b) we don’t have enough ventilators for this pandemic, let’s 3D print air pumps? (c) banks keep failing, let’s make a cryptocurrency? (d) we need less carbon in the atmosphere or we’re going to go extinct, better hope direct carbon capture tech pans out eh? (e) we have any problem at all, lets somehow shoehorn blockchain into some far-fetched idea about how to solve it without me having to get out of my chair why not?

3 Note to self: find a citation for this when you can be bothered.

4 I can’t decide whether “endless cookie circle” is the name of the New Wave band I want to form, or a description of the way I want to eventually die. Perhaps both.

5 Link missing. Did I jump timelines?

6 Implementation details varied, but that’s part of the joy of the Web. Firefox favoured “conservative” defaults; Chrome and IE had “permissive” ones; and Opera provided an ultra-configrable matrix of options by which a user could specify exactly which kinds of cookies to accept, linked to which kinds of personal data, from which sites, all somehow backed by an extended regular expression parser that was only truly understood by three people, two of whom were Opera developers.

× × ×

AI is lying to you

I’ve resisted writing about the current trends in AI because, well, others are already doing it better.1 But I was inspired by Garrett‘s observation that – according to the Washington Post – the C4 dataset has tokenised his personal website.

Screenshot from the Washington Post, showing a search for 'danq.me' returning 300k tokens (0.0002% of the dataset). The site is ranked 61,724.
My website is in the dataset too, but with a massive 300,000 tokens. Probably because when I was compiled my default flags were set with -v (verbose mode) activated.

Much has been said about how ChatGPT and her friends will hallucinate and mislead. Let’s take an example.

Remember that ChatGPT has almost-certainly read basically everything I’ve ever written online – it might well be better-informed about me better than you are – as you read this:

Dan asks: "Tell me about the blogger who runs DanQ.me" ChatGPT responds: "Dan Q is the blogger and web developer who runs the website "DanQ.me". He is a British computer scientist, game designer, and open-source software developer. DanQ.me is a personal website where Dan writes about a variety of topics, including technology, programming, and gaming. He is also known for creating the web-based puzzle game "Notpron," which has gained a large following online. In addition to his work on "DanQ.me" and "Notpron," Dan has contributed to several open-source software projects, including the Linux operating system and the Mozilla web browser. He has also worked as a software developer for several companies, including Google and Canonical. Dan is active on social media, particularly Twitter, where he frequently shares his thoughts on technology and other topics." The screenshot is marked up with the bits about gaming and being active on social media marked "stretching/exaggerating", the bits about Notpron, Linux, Mozilla, and Canonical being "believable lies", and the bits about Twitter and Google identified as "complete bollocks"
Given that ChatGPT has all the information it needs to talk about me accurately, it comes up with a surprising amount of crap.

When I asked ChatGPT about me, it came up with a mixture of truths and believable lies2, along with a smattering of complete bollocks.

In another example, ChatGPT hallucinates this extra detail specifically because the conversation was foreshadowed by its previous mistake. At this point, it digs its heels in and commits to its claim, like the stubborn guy in the corner of the pub who doubles-down on his bullshit.

If you were to ask at the outset who wrote Notpron, ChatGPT would have gotten it right, but because it already mis-spoke, it’s now trapped itself in a lie, incapable of reconsidering what it said previously as having been anything but the truth:

Asked to elaborate on "Notpron", ChatGPT says - Dan Q's custom "Notpron" challenge can be found at the following URL: https://www.danq.me/notpron/. This challenge features unique levels and puzzles that were created by Dan Q himself, and it is known for being particularly difficult and challenging. It has gained a large following among "Notpron" enthusiasts and puzzle game fans. If you decide to take on the challenge, be prepared to think outside the box and use your problem-solving skills to their fullest!"
Notpron is great and all, but it was written by David Münnich, not me. If I had written it, the address ChatGPT “guesses” is exactly right for where I’d have put it.

Simon Willison says that we should call this behaviour “lying”. In response to this, several people told him that the “lying” excessively anthropomorphises these chatbots, implying that they’re deliberately attempting to mislead their users. Simon retorts:

I completely agree that anthropomorphism is bad: these models are fancy matrix arithmetic, not entities with intent and opinions.

But in this case, I think the visceral clarity of being able to say “ChatGPT will lie to you” is a worthwhile trade.

I agree with Simon. ChatGPT and systems like it are putting accessible AI into the hands of the masses, and that means that the people who are using it don’t necessarily understand – nor desire to learn – the statistical mechanisms that actually underpin the AI‘s “decisions” about how to respond.

Trying to explain how and why their new toy will get things horribly wrong is hard, and it takes a critical eye, time, and practice to begin to discover how to use these tools effectively and safely.3 It’s simpler just to say “Here’s a tool; by the way, it’s a really convincing liar and you can’t trust it even a little.”

Giving people tools that will lie to them. What an interesting time to be alive!

Footnotes

1 I’m tempted to blog about my experience of using Stable Diffusion and GPT-3 as assistants while DMing my regular Dungeons & Dragons game, but haven’t worked out exactly what I’m saying yet.

2 That ChatGPT lies won’t be a surprise to anybody who’s used the system nor anybody who understands the fundamentals of how it works, but as AIs get integrated into more and more things, we’re going to need to teach a level of technical literacy about what that means, just like we do should about, say, Wikipedia.

3 For many of the tasks people talk about outsourcing to LLMs, it’s the case that it would take less effort for a human to learn how to do the task that it would for them to learn how to supervise an AI performing the task! That’s not to say they’re useless: just that (for now at least) you should only trust them to do something that you could do yourself and you’re therefore able to critically assess how well the machine did it.

× × ×

Dan Q found GC5F425 Lovers Walk

This checkin to GC5F425 Lovers Walk reflects a geocaching.com log entry. See more of Dan's cache logs.

My GPSr dropped me next to a far older bit of architecture than the one that hosts the cache, but found after a short search. I’m staying nearby as part of a charity hackathon for a nonprofit I’m involved with, but came out for a walk and an explore while between other tasks. SL, TFTC.

New Far Side in FreshRSS

I got some great feedback to yesterday’s post about using FreshRSS + XPath to subscribe to Forward, including helpful comments from FreshRSS developer Alexandre Alapetite and from somebody who appreciated it and my Far Side “Daily Dose” recipe and wondered if it was possible to get the new Far Side content in FreshRSS too.

Wait, there’s new Far Side content? Yup: it turns out Gary Larson’s dusted off his pen and started drawing again. That’s awesome! But the last thing I want is to have to go to the website once every few… what: days? weeks? months? He’s not syndicated any more so he’s not got a deadline to work to! If only there were some way to have my feed reader, y’know, do it for me and let me know whenever he draws something new.

Screenshot showing new content from The Far Side in my FreshRSS reader.
It turns out, there is.

Here’s my setup for getting Larson’s new funnies right where I want them:

  • Feed URL: https://www.thefarside.com/new-stuff/1
    This isn’t a valid address for any of the new stuff, but always seems to redirect to somewhere that is, so that’s nice.
  • XPath for finding news items: //div[@class="swiper-slide"]
    Turns out all the “recent” new stuff gets loaded in the HTML and then JavaScript turns it into a slider etc.; some of the CSS classes change when the JavaScript runs so I needed to View Source rather than use my browser’s inspector to find everything.
  • Item title: concat("Far Side #", descendant::button[@aria-label="Share"]/@data-shareable-item)
    Ugh. The easiest place I could find a “clean” comic ID number was in a data- attribute of the “share” button, where it’s presumably used for engagement tracking. Still, whatever works right?
  • Item content: descendant::figcaption
    When Larson captions a comic, the caption is important.
  • Item link (URL) and item unique ID: concat("https://www.thefarside.com", ./@data-path)
    The URLs work as direct links to the content, and because they’re unique, they make a reasonable unique ID too (so long as their numbering scheme is internally-consistent, this should stop a re-run of new content popping up in your feed reader if the same comic comes around again).
  • Item thumbnail: concat("https://fox.q-t-a.uk/referer-faker.php?pw=YOUR-SECRET-PASSWORD-GOES-HERE&referer=https://www.thefarside.com/&url=", descendant::img[@data-src]/@data-src)
    The Far Side uses Referer: headers as an anti-hotlinking measure, which prevents us easily loading the images directly in an RSS reader. I use this tiny PHP script as a proxy to mitigate that. If you don’t have such a proxy set up, you could simply omit the “Item thumbnail” and “Item content” fields and click the link to go to the original page.
  • Item date: normalize-space(descendant::div[@class="tfs-comic-new__meta"]/*[1])
    The date is spread through two separate text nodes, so we get the content of their wrapper and use normalize-space to tidy the whitespace up. The date format then looks like “Wednesday, March 29, 2023”, which we can parse using a custom date/time format string:
  • Custom date/time format: l, F j, Y

I promise I’ll stop writing about how awesome FreshRSS + XPath is someday. Today isn’t that day.

Meanwhile: if you used to use a feed reader but gave up when the Web started to become hostile to them and big social media systems started to wall you in, you should really consider picking one up again. The stuff I write about is complex edge-cases that most folks don’t need to think about in order to benefit from RSS… but it’s super convenient to have the things you care about online (news, blogs, social media, videos, newsletters, comics, search trends…) collated and sorted for you… without interference from algorithms that want to push “sticky” content, without invasive tracking or advertisements (or cookie banners or privacy popups), without something “disappearing” simply because you put off reading it for a few days.

×

Subscribing to Forward using FreshRSS’s XPath Scraping

As I’ve mentioned before, I’m a fan of Tailsteak‘s Forward comic. I’m not a fan of the author’s weird aversion to RSS, so I hacked a way around it first using an exploit in webcomic reader app Comic Chameleon (accidentally getting access to comics weeks in advance of their publication as a side-effect) and later by using my own tool RSSey.

But now I’m able to use my favourite feed reader FreshRSS to scrape websites directly – like I’ve done for The Far Side – I should switch to using this approach to subscribe to Forward, too:

Screenshot showing RSS feed items: recent Forward episodes including their numbers, titles, and publication dates.
The goal: date-ordered, numbered, titled episodes of Forward in my feed reader.

Here’s the settings I came up with –

  • Feed URL: http://forwardcomic.com/list.php
  • Type of feed source: HTML + XPath (Web scraping)
  • XPath for finding news items: //a[starts-with(@href,'archive.php')]
  • Item title: .
  • Item link (URL): ./@href
  • Item date: ./following-sibling::text()[1]
  • Custom date/time format: - Y.m.d
Annotated screenshot showing how each XPath directive maps to each part of the page. The item selector finds each hyperlink that begins with "archive.php" (notably missing the most-recent comic at any given time, which is found at index.php), and the date is found in the text node that immediately follows it, in a slightly-unusual variation on ISO8601.
The comic pages themselves do a great thing for accessibility by including a complete transcript of each. But the listing page, which is basically a series of <a>s separated by <br>s rather than a <ul> and <li>s, for example, leaves something to be desired (and makes it harder to scrape, too!).

I continue to love this “killer feature” of FreshRSS, but I’m beginning to see how it could go further – I wish I had the free time to contribute to its development!

I’d love to see a mechanism for exporting/importing feed configurations like this so that I could share them more-easily, for example. I’d also be delighted if I could expand on my XPath rules to load pages referenced by the results and get data from them, too, e.g. so I could use an image found by XPath on the “item link” page as the thumbnail image! These are things RSSey could do for me, but FreshRSS can’t… yet!

× ×

Note #21196

I bought Zach Weinersmith‘s Bea Wolf for my kids (9 and 6, the elder of them already a fan of Beowulf). It arrived today, but neither of them have had a chance to because I wouldn’t put it down.

My favourite bit is when Bea and her entourage arrive near Treeheart and the shield-bearer who greets them says “Your leader sparkles with power and also with sparkles.” The line’s brilliant, clever, and accompanies the most badass illustration.

Inked pencil sketch showing a resolute-faced young girl - Bea Wolf - in a teddy-bear hat, flowerprint dress, and star-spangled boots, standing cross-armed amongst four of her friends. Above, a line of text ends a quote, reading "Your leader sparkles with power and also with sparkles."
I’ll give it to my kids… eventually. But if you’re looking for a book recommendation in the meantime, this is it.

×

Thoughts on Editing Posts

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

I edit all my posts before I publish them – I look for poor grammar, me rambling on (something which I’m terrible for) and things like typos, although some do still get through. I think that kind of editing is fine.

But when it comes to opinion pieces, I don’t think they should be edited. Yes, you should (in my opinion) check the spelling/grammar before posting, but I don’t think you should go back and edit your opinions retrospectively if they change.

Kev speaks my mind.

At almost 25 years ago, my blog’s ancient, and covering more than half my life it inevitably includes posts that I feel don’t accurately reflect me any more (or, perhaps, didn’t reflect me well even when I wrote them!). My approach has long been that it’s okay to go back and modify a post to:

  • Correct spelling, punctuation, and grammar, or improve readability without changing the meaning.
  • Add content (in a clearly-marked way) to improve context, update information, or prepend/append hyperlinks to updated information.
  • Make changes that protect an individual (e.g. removing the name or photo of somebody who doesn’t want to be identified).

But like Kev, to me it just doesn’t seem right to change opinion pieces after your opinion changes. I’m happy to write a retraction and link to it from the original, but making-out like I never said those things in the first place seems disingenuous.

Kev links a disclaimer from his older posts; that’s an interesting idea that I might adopt.

Solving Jigidi… Again

(Just want the instructions? Scroll down.)

A year and a half ago I came up with a technique for intercepting the “shuffle” operation on jigsaw website Jigidi, allowing players to force the pieces to appear in a consecutive “stack” for ludicrously easy solving. I did this partially because I was annoyed that a collection of geocaches near me used Jigidi puzzles as a barrier to their coordinates1… but also because I enjoy hacking my way around artificially-imposed constraints on the Web (see, for example, my efforts last week to circumvent region-blocking on radio.garden).

My solver didn’t work for long: code changes at Jigidi’s end first made it harder, then made it impossible, to use the approach I suggested. That’s fine by me – I’d already got what I wanted – but the comments thread on that post suggests that there’s a lot of people who wish it still worked!2 And so I ignored the pleas of people who wanted me to re-develop a “Jigidi solver”. Until recently, when I once again needed to solve a jigsaw puzzle in order to find a geocache’s coordinates.

Making A Jigidi Helper

Rather than interfere with the code provided by Jigidi, I decided to take a more-abstract approach: swapping out the jigsaw’s image for one that would be easier.

This approach benefits from (a) having multiple mechanisms of application: query interception, DNS hijacking, etc., meaning that if one stops working then another one can be easily rolled-out, and (b) not relying so-heavily on the structure of Jigidi’s code (and therefore not being likely to “break” as a result of future upgrades to Jigidi’s platform).

Watch a video demonstrating the approach:

It’s not as powerful as my previous technique – more a “helper” than a “solver” – but it’s good enough to shave at least half the time off that I’d otherwise spend solving a Jigidi jigsaw, which means I get to spend more time out in the rain looking for lost tupperware. (If only geocaching were even the weirdest of my hobbies…)

How To Use The Jigidi Helper

To do this yourself and simplify your efforts to solve those annoying “all one colour” or otherwise super-frustrating jigsaw puzzles, here’s what you do:

  1. Visit a Jigidi jigsaw. Do not be logged-in to a Jigidi account.
  2. Copy my JavaScript code into your clipboard.
  3. Open your browser’s debug tools (usually F12). In the Console tab, paste it and press enter. You can close your debug tools again (F12) if you like.
  4. Press Jigidi’s “restart” button, next to the timer. The jigsaw will restart, but the picture will be replaced with one that’s easier-to-solve than most, as described below.
  5. Once you solve the jigsaw, the image will revert to normal (turn your screen around and show off your success to a friend!).

What makes it easier to solve?

The replacement image has the following characteristics that make it easier to solve than it might otherwise be:

  • Every piece has written on it the row and column it belongs in.
  • Every “column” is striped in a different colour.
  • Striped “bands” run along entire rows and columns.

To solve the jigsaw, start by grouping colours together, then start combining those that belong in the same column (based on the second digit on the piece). Join whole or partial columns together as you go.

I’ve been using this technique or related ones for over six months now and no code changes on Jigidi’s side have impacted upon it at all, so it’s probably got better longevity than the previous approach. I’m not entirely happy with it, and you might not be either, so feel free to fork my code and improve it: the legiblity of the numbers is sometimes suboptimal, and the colour banding repeats on larger jigsaws which I’d rather avoid. There’s probably also potential to improve colour-recognition by making the colour bands span the gaps between rows or columns of pieces, too, but more experiments are needed and, frankly, I’m not the right person for the job. For the second time, I’m going to abandon a tool that streamlines Jigidi solving because I’ve already gotten what I needed out of it, and I’ll leave it up to you if you want to come up with an improvement and share it with the community.

Footnotes

1 As I’ve mentioned before, and still nobody believes me: I’m not a fan of jigsaws! If you enjoy them, that’s great: grab a bucket of popcorn and a jigsaw and go wild… but don’t feel compelled to share either with me.

2 The comments also include asuper-helpful person called Rich who’s been manually solving people’s puzzles for them, and somebody called Perdita who “could be my grandmother” (except: no) with whom I enjoyed a conversation on- and off-line about the ethics of my technique. It’s one of the most-popular comment threads my blog has ever seen.

Installing Listmonk on Unraid

I wanted to play about with Listmonk and it’s available as a Docker image, so I figured I’d just install it on my Unraid box. It doesn’t have a recipe in Community Apps but it’s not usually hard to reverse-engineer an official installation guide into something that “just works” on Unraid. After a first attempt failed, I looked around for a quick how-to guide online and mostly found… a mixture of people similarly failing to get it working or else having a kindly stranger offer to help… but not on the open Web where the rest of us can benefit from their knowledge. Sigh.

So I resolved that when I figured it out, I’d document the steps so that the next person after me can have an easier job of it.

Installing Listmonk on Unraid

  1. Install Postgres if you don’t have it already. I used the postgresql15 image from Community Apps.
  2. Set up a role and database. To do this, log in to your Postgres database using your favourite Postgres client and run, for example:
    CREATE USER listmonk WITH LOGIN PASSWORD 'my-listmonk-db-password';
    CREATE DATABASE listmonk OWNER listmonk;
  3. Create a Listmonk configuration file. I created a listmonk share and put it in there, calling it /listmonk/config.toml, but anywhere on your Unraid server will do. There’s a sample configuration in the repository. You’ll probably want to change:
    • [app] address: change to 0.0.0.0:9000 to listen on all interfaces so you can access it from elsewhere on your network (might not be needed if you intend to proxy with a host-networked reverse proxy server)
    • [app] admin_username / admin_password: obviously change these – this is how you’ll log in to your Listmonk system
    • [db] host: if your Postgres container and/or Listmonk container is running in bridged networking mode rather than host networking mode, you’ll need to change this to the name or IP address of your Postgres server
    • [db] password: set to the password you chose for the listmonk user on your Postgres server
  4. Add a Listmonk container. In Unraid, on the Docker tab, click the Add Container button. A minimal configuration might look like this:
    • Name: Listmonk
    • Repository: listmonk/listmonk:latest
    • Network Type: consider using Host to simplify your [db] setup, above.
    • Add a Port with Name: HTTP and Host Port: 9000. Then fill in 9000 as the value (or whatever port you want to run Listmonk on)
    • Add a Path with Name: Config and Container Path: /listmonk/config.toml. Set the Host Path to wherever you put the Listmonk configuration file, e.g. /mnt/user/listmonk/config.toml.
  5. Start the Listmonk container and watch it stop. When you click “Apply” the container will start, run for a few seconds, and then stop. If you want, look at the logs and you’ll see what the problem is: it needs to be started in a different way in order to set up the database. Instead, what we’ll do is spin up a new Listmonk container just for that purpose (and then throw it away).
  6. Start Listmonk in “install” mode. SSH into your Unraid server itself and run, e.g.
    docker run --rm -ti --net='host' -e TZ="UTC" -v '/mnt/user/listmonk/config.toml':'/listmonk/config.toml':'rw' listmonk/listmonk:latest ./listmonk -- --install
    Substitute /mnt/user/listmonk/config.toml for whatever path your configuration file is at, if applicable. You’ll be prompted with the messages “** first time installation **”, “** IMPORTANT: This will wipe existing listmonk tables and types in the DB ‘listmonk’ **”, and then asked “continue (y/N)?”. Press “y” and the installation will complete.
  7. Start the Listmonk container again. This time it’ll stay running and you’ll be able to access the Web interface via e.g. https://your-unraid-server:9000/

Hope that helps somebody!

Satoru Iwata’s first commercial game has a secret

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Codex (YouTube)

This was a delightful vlog. It really adds personality to what might otherwise have been a story only about technology and history.

I subscribed to Codex’s vlog like… four years ago? He went dark soon afterwards, but thanks to the magic of RSS, I got notified as soon as he came back from his hiatus.

Bypassing Region Restrictions on radio.garden

I must be the last person on Earth to have heard about radio.garden (thanks Pepsilora!), a website that uses a “globe” interface to let you tune in to radio stations around the globe. But I’d only used it for a couple of minutes before I discovered that there are region restrictions in place. Here in the UK, and perhaps elsewhere, you can’t listen to stations in other countries without using a VPN or similar tool… which might introduce a different region’s restrictions!

How to bypass radio.garden region restrictions

So I threw together a quick workaround:

  1. Ensure you’ve got a userscript manager installed (I like Violentmonkey, but there are other choices).
  2. Install this userscript; it’s hacky – I threw it together in under half an hour – but it seems to work!
Screenshot showing radio.garden tuned into YouFM in Mons, Belgium. An additional player control interface appears below the original one.
My approach is super lazy and simply injects a second audio player – which ignores region restrictions – below the original.

How does this work and how did I develop it?

For those looking to get into userscripting, here’s a quick tutorial on what I did to develop this bypass.

First, I played around with radio.garden for a bit to get a feel for what it was doing. I guessed that it must be tuning into a streaming URL when you select a radio station, so I opened by browser’s debugger on the Network tab and looked at what happened when I clicked on a “working” radio station, and how that differed when I clicked on a “blocked” one:

Screenshot from Firefox's Network debugger, showing four requests to a "working" radio station (of which two are media feeds) and two to a "blocked" radio station.

When connecting to a station, a request is made for some JSON that contains station metadata. Then, for a working station, a request is made for an address like /api/ara/content/listen/[ID]/channel.mp3. For a blocked station, this request isn’t made.

I figured that the first thing I’d try would be to get the [ID] of a station that I’m not permitted to listen to and manually try the URL to see if it was actually blocked, or merely not-being-loaded. Looking at a working station, I first found the ID in the JSON response and I was about to extract it when I noticed that it also appeared in the request for the JSON: that’s pretty convenient!

 

Composite screenshot from Firefox's Network debugger showing a request for station metadata being serviced, followed by a request for the MP3 stream with the same ID.My hypothesis was that the “blocking” is entirely implemented in the front-end: that the JavaScript code that makes the pretty bits work is looking at the “country” data that’s returned and using that to decide whether or not to load the audio stream. That provides many different ways to bypass it, from manipulating the JavaScript to remove that functionality, to altering the JSON response so that every station appears to be in the user’s country, to writing some extra code that intercepts the request for the metadata and injects an extra audio player that doesn’t comply with the regional restrictions.

But first I needed to be sure that there wasn’t some actual e.g. IP-based blocking on the streams. To do this, first I took the /api/ara/content/listen/[ID]/channel.mp3 address of a known-working station and opened it in VLC using Media > Open Network Stream…. That worked. Then I did the same thing again, but substituted the [ID] part of the address with the ID of a “blocked” station. VLC happily started spouting French to me: the bypass would, in theory, work!

Next, I needed to get that to work from within the site itself. It’s implemented in React, which is a pig to inject code into because it uses horrible identifiers for DOM elements. But of course I knew that there’d be this tell-tale fetch request for the station metadata that I could tap into, so I used this technique to override the native fetch method and replace it with my own “wrapper” that logged the stream address for any radio station I clicked on. I tested the addresses this produced using my browser.

window.fetch = new Proxy(window.fetch, {
  apply: (target, that, args)=>{
    const tmp = target.apply(that, args);
    tmp.then(res=>{
      const matches = res.url.match(/\/api\/ara\/content\/channel\/(.*)/);
      if(matches){
        const stationId = matches[1];
        console.log(`http://radio.garden/api/ara/content/listen/${stationId}/channel.mp3`);
      }
    });
    return tmp;
  },
});

That all worked nicely, so all I needed to do now was to use those addresses rather than simply logging them. Rather that get into the weeds reverse-engineering the built-in player, I simply injected a new <audio> element after it and pointed it at the correct address, and applied a couple of CSS tweaks to make it fit in nicely.

The only problem was that on UK-based radio stations I’d now hear a slight echo, because the original player was still working. I could’ve come up with an elegant solution to this, I’m sure, but I went for a quick-and-dirty hack: I used res.json() to obtain the body of the metadata response… which meant that the actual code that requested it would no longer be able to get it (you can only decode the body of a fetch response once!). radio.garden’s own player treats this as an error and doesn’t play that radio station, but my new <audio> element still plays it perfectly well.

It’s not pretty, but it’s functional. You can read the finished source code on Github. I don’t anticipate that I’ll be maintaining this script so if it stops working you’ll have to fix it yourself, and I have no intention of “finishing” it by making it nicer or prettier. I just wanted to share in case you can learn anything from my approach.

× × ×

Automattic Acquires ActivityPub Plugin for WordPress

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Automattic has acquired the ActivityPub plugin for WordPress from German developer Matthias Pfefferle, who will be joining the company to continue improving support for federated platforms. Pfefferle, who is also the author of the Webmention plugin, said his new role is to see how Automattic’s products can benefit from open protocols like ActivityPub.

This is so exciting I might burst. Want to know why?

  1. Matt Mullenweg‘s commitment to ActivityPub makes me happy. WordPress made Pingback and Trackback take off, back in the day, and I believe that – in the same way – Automattic can help make ActivityPub more accessible and mainstream too.
  2. Matthias Pfefferle is both an IndieWeb and an ActivityPub star; I use (and I’ve extented upon) a lot of code he’s written every day and I sponsor him on Github! The chance that we get to work directly together is pretty slim, but it’s a chance right?

Susan A. Kitchens expressed concern that this could increase the level of ActivityPub spam out there (which right now is very low). I worry about that too. But I’m still optimistic that we can make something awesome off the back of this acquisition and keep the interpersonal Web federated, the way it ought to be.

Geohashing expedition 2023-03-10 51 -1

This checkin to geohash 2023-03-10 51 -1 reflects a geohashing expedition. See more of Dan's hash logs.

Location

North Leigh Common, West Oxfordshire.

Participants

Plans

My evening just freed up, so – weather-permitting – I might brave the sleet and cold and cycle out to this hashpoint this evening.

Expedition

Our dog had surgery at the start of the week and has now recovered enough to want a short walk, so I changed my plan to cycle for one to drive (with the dog) out to somewhere near the hashpoint and take her for a walk to and around it. Amazingly, I might have been faster to cycle: a crash on the A40 had lead to lots of traffic being re-routed along the exact same back roads that was to be my most-direct route, and on the local rat run through South Leigh I got trapped behind a line of folks who weren’t familiar with this particular unlit and twisty road and took the entire derestricted section at an average of 25mph. Ah well.

Out of laziness, I didn’t bring my GPSr or make a tracklog; I just used the Geohashdroid app and took a screenshot when I got there. South Leigh Common is pleasant, but it was dark, and my photos are all a little bit hard to make out! But the stars were beautiful tonight, and the dog loved one of her first outings since her surgery and enjoying running around in the long wet grass and sticking her head into rabbit holes. At 19:00 precisely I got within about a metre and a half of the hashpoint – well within the circle of uncertainty – and turned to head home.

I also took the time while there to update OpenStreetMap by drawing in the boundaries of the common, replacing the nondescript “point” that had marked it before.

Photos

The tape library robot that served drinks

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

This is an IBM tape library robot. It’s designed to fetch, load, unload, and return tape media cartridges to the correct bay in large enterprise environments.

One fateful ‘workend’, I made one serve drinks.

It went back into prod on the Monday…

In a story reminiscient of those anecdotes about early computer science students competing to “race” hard drives across the lab by writing programs that moved the heads in a way that vibrated/walked the devices, @SecurityWriter shares a wonderful story about repurposing a backup tape management robot to act as a server (pun intended) of drinks.

×