New Far Side in FreshRSS

I got some great feedback to yesterday’s post about using FreshRSS + XPath to subscribe to Forward, including helpful comments from FreshRSS developer Alexandre Alapetite and from somebody who appreciated it and my Far Side “Daily Dose” recipe and wondered if it was possible to get the new Far Side content in FreshRSS too.

Wait, there’s new Far Side content? Yup: it turns out Gary Larson’s dusted off his pen and started drawing again. That’s awesome! But the last thing I want is to have to go to the website once every few… what: days? weeks? months? He’s not syndicated any more so he’s not got a deadline to work to! If only there were some way to have my feed reader, y’know, do it for me and let me know whenever he draws something new.

Screenshot showing new content from The Far Side in my FreshRSS reader.
It turns out, there is.

Here’s my setup for getting Larson’s new funnies right where I want them:

  • Feed URL: https://www.thefarside.com/new-stuff/1
    This isn’t a valid address for any of the new stuff, but always seems to redirect to somewhere that is, so that’s nice.
  • XPath for finding news items: //div[@class="swiper-slide"]
    Turns out all the “recent” new stuff gets loaded in the HTML and then JavaScript turns it into a slider etc.; some of the CSS classes change when the JavaScript runs so I needed to View Source rather than use my browser’s inspector to find everything.
  • Item title: concat("Far Side #", descendant::button[@aria-label="Share"]/@data-shareable-item)
    Ugh. The easiest place I could find a “clean” comic ID number was in a data- attribute of the “share” button, where it’s presumably used for engagement tracking. Still, whatever works right?
  • Item content: descendant::figcaption
    When Larson captions a comic, the caption is important.
  • Item link (URL) and item unique ID: concat("https://www.thefarside.com", ./@data-path)
    The URLs work as direct links to the content, and because they’re unique, they make a reasonable unique ID too (so long as their numbering scheme is internally-consistent, this should stop a re-run of new content popping up in your feed reader if the same comic comes around again).
  • Item thumbnail: concat("https://fox.q-t-a.uk/referer-faker.php?pw=YOUR-SECRET-PASSWORD-GOES-HERE&referer=https://www.thefarside.com/&url=", descendant::img[@data-src]/@data-src)
    The Far Side uses Referer: headers as an anti-hotlinking measure, which prevents us easily loading the images directly in an RSS reader. I use this tiny PHP script as a proxy to mitigate that. If you don’t have such a proxy set up, you could simply omit the “Item thumbnail” and “Item content” fields and click the link to go to the original page.
  • Item date: normalize-space(descendant::div[@class="tfs-comic-new__meta"]/*[1])
    The date is spread through two separate text nodes, so we get the content of their wrapper and use normalize-space to tidy the whitespace up. The date format then looks like “Wednesday, March 29, 2023”, which we can parse using a custom date/time format string:
  • Custom date/time format: l, F j, Y

I promise I’ll stop writing about how awesome FreshRSS + XPath is someday. Today isn’t that day.

Meanwhile: if you used to use a feed reader but gave up when the Web started to become hostile to them and big social media systems started to wall you in, you should really consider picking one up again. The stuff I write about is complex edge-cases that most folks don’t need to think about in order to benefit from RSS… but it’s super convenient to have the things you care about online (news, blogs, social media, videos, newsletters, comics, search trends…) collated and sorted for you… without interference from algorithms that want to push “sticky” content, without invasive tracking or advertisements (or cookie banners or privacy popups), without something “disappearing” simply because you put off reading it for a few days.

×

Subscribing to Forward using FreshRSS’s XPath Scraping

As I’ve mentioned before, I’m a fan of Tailsteak‘s Forward comic. I’m not a fan of the author’s weird aversion to RSS, so I hacked a way around it first using an exploit in webcomic reader app Comic Chameleon (accidentally getting access to comics weeks in advance of their publication as a side-effect) and later by using my own tool RSSey.

But now I’m able to use my favourite feed reader FreshRSS to scrape websites directly – like I’ve done for The Far Side – I should switch to using this approach to subscribe to Forward, too:

Screenshot showing RSS feed items: recent Forward episodes including their numbers, titles, and publication dates.
The goal: date-ordered, numbered, titled episodes of Forward in my feed reader.

Here’s the settings I came up with –

  • Feed URL: http://forwardcomic.com/list.php
  • Type of feed source: HTML + XPath (Web scraping)
  • XPath for finding news items: //a[starts-with(@href,'archive.php')]
  • Item title: .
  • Item link (URL): ./@href
  • Item date: ./following-sibling::text()[1]
  • Custom date/time format: - Y.m.d
Annotated screenshot showing how each XPath directive maps to each part of the page. The item selector finds each hyperlink that begins with "archive.php" (notably missing the most-recent comic at any given time, which is found at index.php), and the date is found in the text node that immediately follows it, in a slightly-unusual variation on ISO8601.
The comic pages themselves do a great thing for accessibility by including a complete transcript of each. But the listing page, which is basically a series of <a>s separated by <br>s rather than a <ul> and <li>s, for example, leaves something to be desired (and makes it harder to scrape, too!).

I continue to love this “killer feature” of FreshRSS, but I’m beginning to see how it could go further – I wish I had the free time to contribute to its development!

I’d love to see a mechanism for exporting/importing feed configurations like this so that I could share them more-easily, for example. I’d also be delighted if I could expand on my XPath rules to load pages referenced by the results and get data from them, too, e.g. so I could use an image found by XPath on the “item link” page as the thumbnail image! These are things RSSey could do for me, but FreshRSS can’t… yet!

× ×

Note #21196

I bought Zach Weinersmith‘s Bea Wolf for my kids (9 and 6, the elder of them already a fan of Beowulf). It arrived today, but neither of them have had a chance to because I wouldn’t put it down.

My favourite bit is when Bea and her entourage arrive near Treeheart and the shield-bearer who greets them says “Your leader sparkles with power and also with sparkles.” The line’s brilliant, clever, and accompanies the most badass illustration.

Inked pencil sketch showing a resolute-faced young girl - Bea Wolf - in a teddy-bear hat, flowerprint dress, and star-spangled boots, standing cross-armed amongst four of her friends. Above, a line of text ends a quote, reading "Your leader sparkles with power and also with sparkles."
I’ll give it to my kids… eventually. But if you’re looking for a book recommendation in the meantime, this is it.

×

Thoughts on Editing Posts

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

I edit all my posts before I publish them – I look for poor grammar, me rambling on (something which I’m terrible for) and things like typos, although some do still get through. I think that kind of editing is fine.

But when it comes to opinion pieces, I don’t think they should be edited. Yes, you should (in my opinion) check the spelling/grammar before posting, but I don’t think you should go back and edit your opinions retrospectively if they change.

Kev speaks my mind.

At almost 25 years ago, my blog’s ancient, and covering more than half my life it inevitably includes posts that I feel don’t accurately reflect me any more (or, perhaps, didn’t reflect me well even when I wrote them!). My approach has long been that it’s okay to go back and modify a post to:

  • Correct spelling, punctuation, and grammar, or improve readability without changing the meaning.
  • Add content (in a clearly-marked way) to improve context, update information, or prepend/append hyperlinks to updated information.
  • Make changes that protect an individual (e.g. removing the name or photo of somebody who doesn’t want to be identified).

But like Kev, to me it just doesn’t seem right to change opinion pieces after your opinion changes. I’m happy to write a retraction and link to it from the original, but making-out like I never said those things in the first place seems disingenuous.

Kev links a disclaimer from his older posts; that’s an interesting idea that I might adopt.

Solving Jigidi… Again

(Just want the instructions? Scroll down.)

A year and a half ago I came up with a technique for intercepting the “shuffle” operation on jigsaw website Jigidi, allowing players to force the pieces to appear in a consecutive “stack” for ludicrously easy solving. I did this partially because I was annoyed that a collection of geocaches near me used Jigidi puzzles as a barrier to their coordinates1… but also because I enjoy hacking my way around artificially-imposed constraints on the Web (see, for example, my efforts last week to circumvent region-blocking on radio.garden).

My solver didn’t work for long: code changes at Jigidi’s end first made it harder, then made it impossible, to use the approach I suggested. That’s fine by me – I’d already got what I wanted – but the comments thread on that post suggests that there’s a lot of people who wish it still worked!2 And so I ignored the pleas of people who wanted me to re-develop a “Jigidi solver”. Until recently, when I once again needed to solve a jigsaw puzzle in order to find a geocache’s coordinates.

Making A Jigidi Helper

Rather than interfere with the code provided by Jigidi, I decided to take a more-abstract approach: swapping out the jigsaw’s image for one that would be easier.

This approach benefits from (a) having multiple mechanisms of application: query interception, DNS hijacking, etc., meaning that if one stops working then another one can be easily rolled-out, and (b) not relying so-heavily on the structure of Jigidi’s code (and therefore not being likely to “break” as a result of future upgrades to Jigidi’s platform).

Watch a video demonstrating the approach:

It’s not as powerful as my previous technique – more a “helper” than a “solver” – but it’s good enough to shave at least half the time off that I’d otherwise spend solving a Jigidi jigsaw, which means I get to spend more time out in the rain looking for lost tupperware. (If only geocaching were even the weirdest of my hobbies…)

How To Use The Jigidi Helper

To do this yourself and simplify your efforts to solve those annoying “all one colour” or otherwise super-frustrating jigsaw puzzles, here’s what you do:

  1. Visit a Jigidi jigsaw. Do not be logged-in to a Jigidi account.
  2. Copy my JavaScript code into your clipboard.
  3. Open your browser’s debug tools (usually F12). In the Console tab, paste it and press enter. You can close your debug tools again (F12) if you like.
  4. Press Jigidi’s “restart” button, next to the timer. The jigsaw will restart, but the picture will be replaced with one that’s easier-to-solve than most, as described below.
  5. Once you solve the jigsaw, the image will revert to normal (turn your screen around and show off your success to a friend!).

What makes it easier to solve?

The replacement image has the following characteristics that make it easier to solve than it might otherwise be:

  • Every piece has written on it the row and column it belongs in.
  • Every “column” is striped in a different colour.
  • Striped “bands” run along entire rows and columns.

To solve the jigsaw, start by grouping colours together, then start combining those that belong in the same column (based on the second digit on the piece). Join whole or partial columns together as you go.

I’ve been using this technique or related ones for over six months now and no code changes on Jigidi’s side have impacted upon it at all, so it’s probably got better longevity than the previous approach. I’m not entirely happy with it, and you might not be either, so feel free to fork my code and improve it: the legiblity of the numbers is sometimes suboptimal, and the colour banding repeats on larger jigsaws which I’d rather avoid. There’s probably also potential to improve colour-recognition by making the colour bands span the gaps between rows or columns of pieces, too, but more experiments are needed and, frankly, I’m not the right person for the job. For the second time, I’m going to abandon a tool that streamlines Jigidi solving because I’ve already gotten what I needed out of it, and I’ll leave it up to you if you want to come up with an improvement and share it with the community.

Footnotes

1 As I’ve mentioned before, and still nobody believes me: I’m not a fan of jigsaws! If you enjoy them, that’s great: grab a bucket of popcorn and a jigsaw and go wild… but don’t feel compelled to share either with me.

2 The comments also include asuper-helpful person called Rich who’s been manually solving people’s puzzles for them, and somebody called Perdita who “could be my grandmother” (except: no) with whom I enjoyed a conversation on- and off-line about the ethics of my technique. It’s one of the most-popular comment threads my blog has ever seen.

Installing Listmonk on Unraid

I wanted to play about with Listmonk and it’s available as a Docker image, so I figured I’d just install it on my Unraid box. It doesn’t have a recipe in Community Apps but it’s not usually hard to reverse-engineer an official installation guide into something that “just works” on Unraid. After a first attempt failed, I looked around for a quick how-to guide online and mostly found… a mixture of people similarly failing to get it working or else having a kindly stranger offer to help… but not on the open Web where the rest of us can benefit from their knowledge. Sigh.

So I resolved that when I figured it out, I’d document the steps so that the next person after me can have an easier job of it.

Installing Listmonk on Unraid

  1. Install Postgres if you don’t have it already. I used the postgresql15 image from Community Apps.
  2. Set up a role and database. To do this, log in to your Postgres database using your favourite Postgres client and run, for example:
    CREATE USER listmonk WITH LOGIN PASSWORD 'my-listmonk-db-password';
    CREATE DATABASE listmonk OWNER listmonk;
  3. Create a Listmonk configuration file. I created a listmonk share and put it in there, calling it /listmonk/config.toml, but anywhere on your Unraid server will do. There’s a sample configuration in the repository. You’ll probably want to change:
    • [app] address: change to 0.0.0.0:9000 to listen on all interfaces so you can access it from elsewhere on your network (might not be needed if you intend to proxy with a host-networked reverse proxy server)
    • [app] admin_username / admin_password: obviously change these – this is how you’ll log in to your Listmonk system
    • [db] host: if your Postgres container and/or Listmonk container is running in bridged networking mode rather than host networking mode, you’ll need to change this to the name or IP address of your Postgres server
    • [db] password: set to the password you chose for the listmonk user on your Postgres server
  4. Add a Listmonk container. In Unraid, on the Docker tab, click the Add Container button. A minimal configuration might look like this:
    • Name: Listmonk
    • Repository: listmonk/listmonk:latest
    • Network Type: consider using Host to simplify your [db] setup, above.
    • Add a Port with Name: HTTP and Host Port: 9000. Then fill in 9000 as the value (or whatever port you want to run Listmonk on)
    • Add a Path with Name: Config and Container Path: /listmonk/config.toml. Set the Host Path to wherever you put the Listmonk configuration file, e.g. /mnt/user/listmonk/config.toml.
  5. Start the Listmonk container and watch it stop. When you click “Apply” the container will start, run for a few seconds, and then stop. If you want, look at the logs and you’ll see what the problem is: it needs to be started in a different way in order to set up the database. Instead, what we’ll do is spin up a new Listmonk container just for that purpose (and then throw it away).
  6. Start Listmonk in “install” mode. SSH into your Unraid server itself and run, e.g.
    docker run --rm -ti --net='host' -e TZ="UTC" -v '/mnt/user/listmonk/config.toml':'/listmonk/config.toml':'rw' listmonk/listmonk:latest ./listmonk -- --install
    Substitute /mnt/user/listmonk/config.toml for whatever path your configuration file is at, if applicable. You’ll be prompted with the messages “** first time installation **”, “** IMPORTANT: This will wipe existing listmonk tables and types in the DB ‘listmonk’ **”, and then asked “continue (y/N)?”. Press “y” and the installation will complete.
  7. Start the Listmonk container again. This time it’ll stay running and you’ll be able to access the Web interface via e.g. https://your-unraid-server:9000/

Hope that helps somebody!

Satoru Iwata’s first commercial game has a secret

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Codex (YouTube)

This was a delightful vlog. It really adds personality to what might otherwise have been a story only about technology and history.

I subscribed to Codex’s vlog like… four years ago? He went dark soon afterwards, but thanks to the magic of RSS, I got notified as soon as he came back from his hiatus.

Bypassing Region Restrictions on radio.garden

I must be the last person on Earth to have heard about radio.garden (thanks Pepsilora!), a website that uses a “globe” interface to let you tune in to radio stations around the globe. But I’d only used it for a couple of minutes before I discovered that there are region restrictions in place. Here in the UK, and perhaps elsewhere, you can’t listen to stations in other countries without using a VPN or similar tool… which might introduce a different region’s restrictions!

How to bypass radio.garden region restrictions

So I threw together a quick workaround:

  1. Ensure you’ve got a userscript manager installed (I like Violentmonkey, but there are other choices).
  2. Install this userscript; it’s hacky – I threw it together in under half an hour – but it seems to work!
Screenshot showing radio.garden tuned into YouFM in Mons, Belgium. An additional player control interface appears below the original one.
My approach is super lazy and simply injects a second audio player – which ignores region restrictions – below the original.

How does this work and how did I develop it?

For those looking to get into userscripting, here’s a quick tutorial on what I did to develop this bypass.

First, I played around with radio.garden for a bit to get a feel for what it was doing. I guessed that it must be tuning into a streaming URL when you select a radio station, so I opened by browser’s debugger on the Network tab and looked at what happened when I clicked on a “working” radio station, and how that differed when I clicked on a “blocked” one:

Screenshot from Firefox's Network debugger, showing four requests to a "working" radio station (of which two are media feeds) and two to a "blocked" radio station.

When connecting to a station, a request is made for some JSON that contains station metadata. Then, for a working station, a request is made for an address like /api/ara/content/listen/[ID]/channel.mp3. For a blocked station, this request isn’t made.

I figured that the first thing I’d try would be to get the [ID] of a station that I’m not permitted to listen to and manually try the URL to see if it was actually blocked, or merely not-being-loaded. Looking at a working station, I first found the ID in the JSON response and I was about to extract it when I noticed that it also appeared in the request for the JSON: that’s pretty convenient!

 

Composite screenshot from Firefox's Network debugger showing a request for station metadata being serviced, followed by a request for the MP3 stream with the same ID.My hypothesis was that the “blocking” is entirely implemented in the front-end: that the JavaScript code that makes the pretty bits work is looking at the “country” data that’s returned and using that to decide whether or not to load the audio stream. That provides many different ways to bypass it, from manipulating the JavaScript to remove that functionality, to altering the JSON response so that every station appears to be in the user’s country, to writing some extra code that intercepts the request for the metadata and injects an extra audio player that doesn’t comply with the regional restrictions.

But first I needed to be sure that there wasn’t some actual e.g. IP-based blocking on the streams. To do this, first I took the /api/ara/content/listen/[ID]/channel.mp3 address of a known-working station and opened it in VLC using Media > Open Network Stream…. That worked. Then I did the same thing again, but substituted the [ID] part of the address with the ID of a “blocked” station. VLC happily started spouting French to me: the bypass would, in theory, work!

Next, I needed to get that to work from within the site itself. It’s implemented in React, which is a pig to inject code into because it uses horrible identifiers for DOM elements. But of course I knew that there’d be this tell-tale fetch request for the station metadata that I could tap into, so I used this technique to override the native fetch method and replace it with my own “wrapper” that logged the stream address for any radio station I clicked on. I tested the addresses this produced using my browser.

window.fetch = new Proxy(window.fetch, {
  apply: (target, that, args)=>{
    const tmp = target.apply(that, args);
    tmp.then(res=>{
      const matches = res.url.match(/\/api\/ara\/content\/channel\/(.*)/);
      if(matches){
        const stationId = matches[1];
        console.log(`http://radio.garden/api/ara/content/listen/${stationId}/channel.mp3`);
      }
    });
    return tmp;
  },
});

That all worked nicely, so all I needed to do now was to use those addresses rather than simply logging them. Rather that get into the weeds reverse-engineering the built-in player, I simply injected a new <audio> element after it and pointed it at the correct address, and applied a couple of CSS tweaks to make it fit in nicely.

The only problem was that on UK-based radio stations I’d now hear a slight echo, because the original player was still working. I could’ve come up with an elegant solution to this, I’m sure, but I went for a quick-and-dirty hack: I used res.json() to obtain the body of the metadata response… which meant that the actual code that requested it would no longer be able to get it (you can only decode the body of a fetch response once!). radio.garden’s own player treats this as an error and doesn’t play that radio station, but my new <audio> element still plays it perfectly well.

It’s not pretty, but it’s functional. You can read the finished source code on Github. I don’t anticipate that I’ll be maintaining this script so if it stops working you’ll have to fix it yourself, and I have no intention of “finishing” it by making it nicer or prettier. I just wanted to share in case you can learn anything from my approach.

× × ×

Automattic Acquires ActivityPub Plugin for WordPress

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Automattic has acquired the ActivityPub plugin for WordPress from German developer Matthias Pfefferle, who will be joining the company to continue improving support for federated platforms. Pfefferle, who is also the author of the Webmention plugin, said his new role is to see how Automattic’s products can benefit from open protocols like ActivityPub.

This is so exciting I might burst. Want to know why?

  1. Matt Mullenweg‘s commitment to ActivityPub makes me happy. WordPress made Pingback and Trackback take off, back in the day, and I believe that – in the same way – Automattic can help make ActivityPub more accessible and mainstream too.
  2. Matthias Pfefferle is both an IndieWeb and an ActivityPub star; I use (and I’ve extented upon) a lot of code he’s written every day and I sponsor him on Github! The chance that we get to work directly together is pretty slim, but it’s a chance right?

Susan A. Kitchens expressed concern that this could increase the level of ActivityPub spam out there (which right now is very low). I worry about that too. But I’m still optimistic that we can make something awesome off the back of this acquisition and keep the interpersonal Web federated, the way it ought to be.

Geohashing expedition 2023-03-10 51 -1

This checkin to geohash 2023-03-10 51 -1 reflects a geohashing expedition. See more of Dan's hash logs.

Location

North Leigh Common, West Oxfordshire.

Participants

Plans

My evening just freed up, so – weather-permitting – I might brave the sleet and cold and cycle out to this hashpoint this evening.

Expedition

Our dog had surgery at the start of the week and has now recovered enough to want a short walk, so I changed my plan to cycle for one to drive (with the dog) out to somewhere near the hashpoint and take her for a walk to and around it. Amazingly, I might have been faster to cycle: a crash on the A40 had lead to lots of traffic being re-routed along the exact same back roads that was to be my most-direct route, and on the local rat run through South Leigh I got trapped behind a line of folks who weren’t familiar with this particular unlit and twisty road and took the entire derestricted section at an average of 25mph. Ah well.

Out of laziness, I didn’t bring my GPSr or make a tracklog; I just used the Geohashdroid app and took a screenshot when I got there. South Leigh Common is pleasant, but it was dark, and my photos are all a little bit hard to make out! But the stars were beautiful tonight, and the dog loved one of her first outings since her surgery and enjoying running around in the long wet grass and sticking her head into rabbit holes. At 19:00 precisely I got within about a metre and a half of the hashpoint – well within the circle of uncertainty – and turned to head home.

I also took the time while there to update OpenStreetMap by drawing in the boundaries of the common, replacing the nondescript “point” that had marked it before.

Photos

The tape library robot that served drinks

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

This is an IBM tape library robot. It’s designed to fetch, load, unload, and return tape media cartridges to the correct bay in large enterprise environments.

One fateful ‘workend’, I made one serve drinks.

It went back into prod on the Monday…

In a story reminiscient of those anecdotes about early computer science students competing to “race” hard drives across the lab by writing programs that moved the heads in a way that vibrated/walked the devices, @SecurityWriter shares a wonderful story about repurposing a backup tape management robot to act as a server (pun intended) of drinks.

×