BBC News… without the crap

Did I mention recently that I love RSS? That it brings me great joy? That I start and finish almost every day in my feed reader? Probably.

I used to have a single minor niggle with the BBC News RSS feed: that it included sports news, which I didn’t care about. So I wrote a script that downloaded it, stripped sports news, and re-exported the feed for me to subscribe to. Magic.

RSS reader showing duplicate copies of the news story "Barbie 2? 'We'd love to,' says Warner Bros boss", and an entry from BBC Sounds.
Lately my BBC News feed has caused me some annoyance and frustration.

But lately – presumably as a result of technical changes at the Beeb’s side – this feed has found two fresh ways to annoy me:

  1. The feed now re-publishes a story if it gets re-promoted to the front pagebut with a different <guid> (it appears to get a #0 after it when first published, a #1 the second time, and so on). In a typical day the feed reader might scoop up new stories about once an hour, any by the time I get to reading them the same exact story might appear in my reader multiple times. Ugh.
  2. They’ve started adding iPlayer and BBC Sounds content to the BBC News feed. I don’t follow BBC News in my feed reader because I want to watch or listen to things. If you do, that’s fine, but I don’t, and I’d rather filter this content out.

Luckily, I already have a recipe for improving this feed, thanks to my prior work. Let’s look at my newly-revised script (also available on GitHub):

#!/usr/bin/env ruby
require 'bundler/inline'

# # Sample crontab:
# # At 41 minutes past each hour, run the script and log the results
# */20 * * * * ~/bbc-news-rss-filter-sport-out.rb > ~/bbc-news-rss-filter-sport-out.log 2>>&1

# Dependencies:
# * open-uri - load remote URL content easily
# * nokogiri - parse/filter XML
gemfile do
  source 'https://rubygems.org'
  gem 'nokogiri'
end
require 'open-uri'

# Regular expression describing the GUIDs to reject from the resulting RSS feed
# We want to drop everything from the "sport" section of the website, also any iPlayer/Sounds links
REJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/(sport|iplayer|sounds)\//

# Load and filter the original RSS
rss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
rss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING }.each(&:unlink)

# Strip the anchors off the <guid>s: BBC News "republishes" stories by using guids with #0, #1, #2 etc, which results in duplicates in feed readers
rss.css('guid').each{|g|g.content=g.content.gsub(/#.*$/,'')}

File.open( '/www/bbc-news-no-sport.xml', 'w' ){ |f| f.puts(rss.to_s) }
It’s amazing what you can do with Nokogiri and a half dozen lines of Ruby.

That revised script removes from the feed anything whose <guid> suggests it’s sports news or from BBC Sounds or iPlayer, and also strips any “anchor” part of the <guid> before re-exporting the feed. Much better. (Strictly speaking, this can result in a technically-invalid feed by introducing duplicates, but your feed reader oughta be smart enough to compensate for and ignore that: mine certainly is!)

You’re free to take and adapt the script to your own needs, or – if you don’t mind being tied to my opinions about what should be in BBC News’ RSS feed – just subscribe to my copy at: https://fox.q-t-a.uk/bbc-news-no-sport.xml

RSS reader showing duplicate copies of the news story "Barbie 2? 'We'd love to,' says Warner Bros boss", and an entry from BBC Sounds.×

RSS > ActivityPub

RSS is better than ActivityPub1.

Photograph of a boxing match, but with the heads of the competitors replaced with the ActivityPub and RSS logos (and "AP" or "RSS" written on their clothes, respectively). RSS is delivering a powerful uppercut to ActivityPub.
A devastating blow by RSS against a competitor 19 years his junior! For updates on this bout as it develops, don’t forget to subscribe… using either protocol.

When I subscribe to content, I want:

  • Resilient failsafes. ActivityPub has many points-of-failure. A notification might fail to complete transmission as a result of downtime, faults, or network conditions, and the receiving server might never know. A feed reader, conversely, can tell you that an address 404’d or the server was down.
  • Retroactive access. Once you fix the problem above… you still don’t get the message you missed: it’s probably gone forever – there’s no retroactive access. The same is true when your ActivityPub server connects with a peer for the first time: you only ever get new content after that point. RSS, on the other hand, provides some number of “recent” items the moment you first subscribe.
  • Simple subscriptions. RSS can be served from a statically-hosted single file, which makes it suitable to deploy anywhere as well as consume using anything. It can be read, after a fashion, in anything from Lynx upwards.

RSS ticks all these boxes. If I can choose between RSS and ActivityPub to subscribe to your content, and I don’t need a real-time update, I’m probably going to choose RSS.

Footnotes

1 I feel like this statement needs a few clarifications and caveats, but my hot take looks spicier if I bury them in a footnote!

  • By RSS, I mean whichever pull-based basic HTTP you like, be that Atom, JSON Feed, h-entry, or even just properly-marked-up HTML5: did you know that the <article> element is intended to be suitable for syndication use?
  • Obviously I appreciate that RSS and ActivityPub are different tools for different jobs, and there are doubtless use-cases for which ActivityPub is clearly the superior solution.
  • I certainly don’t object to services providing both RSS and ActivityPub as syndication options, like Mastodon does, where both might be good choices.
Photograph of a boxing match, but with the heads of the competitors replaced with the ActivityPub and RSS logos (and "AP" or "RSS" written on their clothes, respectively). RSS is delivering a powerful uppercut to ActivityPub.×

They See Me (Blog)Rolling

Tracy Durnell’s post about blogrolls really spoke to me. Like her, I used to think of a blogroll as a list of people you know personally (who happen to blog)1, but the number of bloggers among my immediate in-person circle of friends has shrunk from several dozen to just a handful, and I dropped my blogroll in around 2008.

A white man wearing a spacesuit sits on a pebble beach using a laptop.
On the Internet, a blogger is only as alone as they choose to be.

But my connection to a wider circle has grown, and like Tracy I enjoy the “hardly strangers” connection I feel with the people I follow online. She writes:

While social media emphasizes the show-off stuff — the vacation in Puerto Vallarta, the full kitchen remodel, the night out on the town — on blogs it still seems that people are sharing more than signalling. These small pleasures seem to be offered in a spirit of generosity — this is too beautiful not to share.

Although I may never interact with all the folks whose blogs I follow, reading the same blogger for a long time does build a (one-sided) connection. I may not know you, author, but I am rooting for you. It’s a different modality of relationship than we may be used to in person, but it’s real: a parasocial relationship simmering with the potential for deeper connection, but also satisfying as it exists.

My first bloggy pan pal, Colin Walker, who I started exchanging emails with earlier this month, followed-up on this with an observation that really gets to the heart of the issue (speaking as somebody who’s long said that my blog’s intended audience is, first and foremost, me):

At its core, blogging is a solitary activity with many (if not most) authors claiming that their blog is for them – myself included. Yet, the implication of audience cannot be ignored. Indeed, the more an author embeds themself in the loose community of blogs, by reading and linking to others, the more that implication becomes reality even if not actively pursued via comments or email.

To that end: I’ve started publishing my blogroll again! Follow that link and you’ll see an only-lightly-curated list of all the people (plus some non-personal blogs, vlogs, and webcomics) I follow (that have updated their feeds within the last year2). Naturally, there’s an OPML version too, and I’ve open-sourced the code I used to generate it (although I can’t imagine anybody’s situation is enough like mine for it to be useful).

The page is a little flaky and there’s things I’d like to do to improve it, but I’d rather publish a basic version now and then come back to it with my gardening gloves on another time to improve it.

Maybe my blogroll has some folks on that you might recognise? Or else: maybe you’re only a single random-click away from somebody new you never heard of before!

Footnotes

1 Possibly marked up with XFN to indicate how you’re connected to one another, but I’ve always had a soft spot for XFN.

2 I often retain subscriptions to dormant feeds and it sometimes pays-off, e.g. when I recently celebrated Octopuns’ return after a 9½-year hiatus!

A white man wearing a spacesuit sits on a pebble beach using a laptop.×

The Underground Blog

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

theunderground.blog is an experimental blog that is only available to read through a feed reader.

If you would like to read the latest posts, you can subscribe to the feed at https://theunderground.blog/feed.xml, using the feed reader of your choice.

Chris first suggested this idea in the footnote of a post that talks about something I’ve been witnessing recently: that blogging seems to be having a renaissance1. I’ve for a few years been telling people that now is the second-best time to start a blog. The best time was, of course, ~20 years ago, but if you missed out first time around (or let your blog die as big social media silos took over): now’s the time to join the growing resurgence!

Anyway, he only went and actually did it! The newest member of RSS Club is likely to be… an entire blog that’s only accessible via a feed reader2.

There’s two posts published so far, and if you want to read them you’ll need to subscribe to theunderground.blog using your feed reader. There’s tips on that page on getting an easy-to-use one if you haven’t already.

Footnotes

1 He also had interesting things to say about OPML, which is a topic close to my heart. I wonder if I ought to start sharing a partial OPML file of my subscriptions?

2 Or by reading the source code, I suppose: on the open Web, that’s always an option. The Web is, indeed, magical.

Stopping WordPress Emoji ‘Images’ in Feeds

After sharing that Octopuns has started posting again after a 9½-year hiatus earlier today, I noticed something odd: where I’d written “I ❤️ FreshRSS“, the heart emoji was huge when viewed in my favourite feed reader.

Screenshot from a web-based RSS reader application, showing recent repost "Groundhog Day". The final line contains a link with the text "I ❤️ FreshRSS", but the red heart emoji seems to be enormous compared to the next adjacent to it.
Why yes, I do subscribe to my own RSS feed. What of it?

It turns out that by default, WordPress replaces emoji in its feeds (and when sending email) with images of those emoji, using the Tweemoji set, and with the alt-text set to the original emoji. These images are hosted at https://s.w.org/images/core/emoji/…-based URLs.

For example, this heart was served with the following HTML code (the number 2764 refers to the codepoint of the emoji):

<img src="https://s.w.org/images/core/emoji/14.0.0/72x72/2764.png"
     alt="❤"
   class="wp-smiley"
   style="height: 1em; max-height: 1em;"
/>

I can see why this functionality was added: what if the feed reader didn’t support Unicode or didn’t have a font capable of showing the appropriate emoji?

But I can also see reasons why it might not be desirable to everybody. For example:

  1. Downloading an image will always be slower than rendering an emoji.
  2. The code to include an image is always more-verbose than simply including an emoji.
  3. As seen above: a feed reader which imposes a minimum size on embedded images might well render one “wrong”.
  4. It’s marginally more-verbose for screen reader users to say “Image: heart emoji” than just “heart emoji”, I imagine.
  5. Serving an third-party image when a feed item is viewed has potential privacy implications that I try hard to avoid.
  6. Replacing emoji with images is probably unnecessary for modern feed readers anyway.

I opted to remove this functionality. I briefly considered overriding the emoji_url filter (which could be used to selfhost the emoji set) but I discovered that I could just un-hook the filters that were being added in the first place.

Here’s what I added to my theme’s functions.php:

remove_filter( 'the_content_feed', 'wp_staticize_emoji' );
remove_filter( 'comment_text_rss', 'wp_staticize_emoji' );

That’s all there is to it. Now, my feed reader shows my system’s emoji instead of a huge image:

Screenshot from a web-based RSS reader application, showing recent repost "Groundhog Day". The final line contains a link with the text "I ❤️ FreshRSS" shown correctly, with a red heart emoji at the appropriate font size.

I’m always grateful to discover that a piece of WordPress functionality, whether core or in an extension, makes proper use of hooks so that its functionality can be changed, extended, or disabled. One of the single best things about the WordPress open-source ecosystem is that you almost never have to edit somebody else’s code (and remember to re-edit it every time you install an update).

Want to hear about other ways I’ve improved WordPress’s feeds?

Screenshot from a web-based RSS reader application, showing recent repost "Groundhog Day". The final line contains a link with the text "I ❤️ FreshRSS", but the red heart emoji seems to be enormous compared to the next adjacent to it.× Screenshot from a web-based RSS reader application, showing recent repost "Groundhog Day". The final line contains a link with the text "I ❤️ FreshRSS" shown correctly, with a red heart emoji at the appropriate font size.×

Groundhog Day

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Partial comic frame showing a groundhog popping its head out of a hole and shouting "RUN!".

After a break of nine and a half years, webcomic Octopuns is back. I have two thoughts:

  1. That’s awesome. I love Octopuns and I’m glad it’s back. If you want a quick taster – a quick slice, if you will – of its kind of humour, I suggest starting with Pizza.
  2. How did I know that Octopuns was back? My RSS reader told me. RSS remains a magical way to keep an eye on what’s happening on the Internet: it’s like a subscription service that delivers you exactly what you want, as soon as it’s available.

I’ve been pleasantly surprised before when my feed reader has identified a creator that’s come back from the dead. I ❤️ FreshRSS.

Partial comic frame showing a groundhog popping its head out of a hole and shouting "RUN!".×

.well-known/feeds

<link rel="alternate" type="application/rss+xml"> is fine, but I feel like there should be a standard for a site, not a page, to share a “list of feeds associated with a site”.

Last night, I dreamed about a way to achieve that: ./well-known/feeds as an OPML document. Here’s mine, and here’s a draft spec.

Interested to hear what Dave Winer thinks…

RSS Zero isn’t the path to RSS Joy

Feed overload is real

The week before last, Katie shared with me that article from last month, Who killed Google Reader? I’d read it before so I didn’t bother clicking through again, but we did end up chatting about RSS a bit1.

Screenshot: Google Reader Notifier popup advises of "461 unread items".
I ditched Google Reader several years before its untimely demise, but I can confirm “461 unread items” was a believable message.

Katie “abandoned feeds a few years ago” because they were “regularly ending up with 200+ unread items that felt overwhelming”.

Conversely: I think that dropping your feed reader because there’s too much to read is… solving the wrong problem.

A white man with dark hair, wearing jeans and a t-shirt, moves to push over a stack of carboard boxes, each smaller than the one beneath it. From bottom to top, the boxes are labelled: stress, email client, mobile pings, doomscrolling, social media silos... and the very top, very smallest box, which glows with sunbeams emitted from it, reads "rss reader".
About half way through editing this image I completely forgot what message I was trying to convey, but I figured I’d keep it anyway and let you come up with your own interpretation.

Dave Rupert last week wrote about his feed reader’s “unread” count having grown to a mammoth 2,000+ items, and his plan to reduce that.

I think that he, like Katie, might be looking at his reader in a different way than I do mine.

FreshRSS sidebar, showing 567 unread items (of which 1 are comics, 2 are friends, 186 are communities, 1 are distractions, 278 are geeky, 1 is "me", 57 are youtube, 13 are strangers, 1 is software, 7 are rss club, 29 are podcasts, and 3 are polyamory. A further 107 are marked as favourites. The "friends" and "rss club" categories are showing warning triangles.
At time of writing, I’ve got 567 unread items. And that’s fine.

RSS is not email!

I’ve been in the position that Katie and David describe: of feeling overwhelmed by the sheer volume of unread items. And I know others have, too. So let me share something I’ve learned sooner:

There’s nothing special about reaching Inbox Zero in your feed reader.

It’s not noble nor enlightened to get to the bottom of your “unread” list.

Your 👏  feed 👏 reader 👏 is 👏 not 👏 an 👏 email 👏 client. 👏

The idea of Inbox Zero as applied to your email inbox is about productivity. Any message in your email might be something that requires urgent action, and you won’t know until you filter through and categorise .

But your RSS reader doesn’t (shouldn’t?) be there to add to your to-do list. Your RSS reader is a list of things you might like to read. In an ideal world, reaching “RSS Zero” would mean that you’ve seen everything on the Internet that you might enjoy. That’s not enlightened; that’s sad!

Google Reader's "Congratulations, you've reached the End of the Internet." Easter Egg screen, shown when all your feeds are empty.
Google Reader understood this, although the word “congratulations” was misplaced.

Use RSS for joy

My RSS reader is a place of joy, never of stress. I’ve tried to boil down the principles that makes it so, and here they are:

  1. Zero is not the target.
    The numbers are to inspire about how much there is “out there” for you, not to enumerate how much work need have to do.
  2. Group your feeds by importance.
    Your feed reader probably lets you group (folder, tag…) your feeds, so you can easily check-in on what you care about and leave other feeds for a rainy day.2 This is good.
  3. Don’t read every article.
    Your feed reader gives you the convenience of keeping content in one place, but you’re not obligated to read every single one. If something doesn’t interest you, mark it as read and move on. No judgement.
  4. Keep things for later.
    Something you want to read, but not now? Find a way to “save for later” to get it out of your main feed so you. Don’t have to scroll past it every day! Star it or tag it3 or push it to your link-saving or note-taking app. I use a link shortener which then feeds back into my feed reader into a “for later” group!
  5. Let topical content expire.
    Have topical/time-dependent feeds (general news media, some social media etc.)? Have reader “purge” unread articles after a time. I have my subscription to BBC News headlines expire after 5 days: if I’ve taken that long to read a headline, it might as well disappear.4
  6. Use your feed reader deliberately.
    You don’t need popup notifications (a new article’s probably already up to an hour stale by the time it hits your reader). We’re all already slaves to notifications! Visit your reader when it suits you. I start and end every day in mine; most days I hit it again a couple of other times. I don’t need a notification: there’s always new content. The reader keeps track of what I’ve not looked at.
  7. It’s not just about text.
    Don’t limit your feed reader to just text. Podcasts are nothing more than RSS feeds with attached audio files; you can keep track in your reader if you like. Most video platforms let you subscribe to a feed of new videos on a channel or playlist basis, so you can e.g. get notified about YouTube channel updates without having to fight with The Algorithm. Features like XPath Scraping in FreshRSS let you subscribe to services that don’t even have feeds: to watch the listings of dogs on local shelter websites when you’re looking to adopt, for example.
  8. Do your reading in your reader.
    Your reader respects your preferences: colour scheme, font size, article ordering, etc. It doesn’t nag you with newsletter signup popups, cookie notices, or ads. Make the most of that. Some RSS feeds try to disincentivise this by providing only summary content, but a good feed reader can work around this for you, fetching actual content in the background.5
  9. Use offline time to catch up on your reading.
    Some of the best readers support offline mode. I find this fantastic when I’m on an aeroplane, because I can catch up on all of the interesting articles I’d not had time to yet while grounded, and my reading will get synchronised when I touch down and disable flight mode.
  10. Make your reader work for you.
    A feed reader is a tool that works for you. If it’s causing you pain, switch to a different tool6, or reconfigure the one you’ve got. And if the way you find joy from RSS is different from me, that’s fine: this is a personal tool, and we don’t have to have the same answer.

And if you’d like to put those tips in your RSS reader to digest later or at your own pace, you can:  here’s an RSS feed containing (only) these RSS tips!

Footnotes

1 You’d  be forgiven for thinking that RSS was my favourite topic, given that so-far-this-year I’ve written about improving WordPress’s feeds, about mathematical quirks in FreshRSS, on using XPath scraping as an RSS alternative (twice), and the joy of getting notified when a vlog channel is ressurected (thanks to RSS). I swear I have other interests.

2 If your feed reader doesn’t support any kind of grouping, get a better reader.

3 If your feed reader doesn’t support any kind of marking/favouriting/tagging of articles, get a better reader.

4 If your feed reader doesn’t support customisable expiry times… well that’s not too unusual, but you might want to consider getting a better reader.

5 FreshRSS calls the feature that fetches actual post content from the resulting page “Article CSS selector on original website”, which is a bit of a mouthful, but you can see what it’s doing. If your feed reader doesn’t support fetching full content… well, it’s probably not that big a deal, but it’s a good nice-to-have if you’re shopping around for a reader, in my opinion.

6 There’s so much choice in feed readers, and migrating between them is (usually) very easy, so everybody can find the best choice for them. Feedly, Inoreader, and The Old Reader are popular, free, and easy-to-use if you’re looking to get started. I prefer a selfhosted tool so I use the amazing FreshRSS (having migrated from Tiny Tiny RSS). Here’s some more tips on getting started. You might prefer a desktop or mobile tool, or even something exotic: part of the beauty of RSS feeds is they’re open and interoperable, so if for example you love using Slack, you can use Slack to push feed updates to you and get almost all the features you need to do everything in my list, including grouping (using channels) and saving for later (using Slackbot/”remind me about this”). Slack’s a perfectly acceptable feed reader for some people!

Screenshot: Google Reader Notifier popup advises of "461 unread items".× A white man with dark hair, wearing jeans and a t-shirt, moves to push over a stack of carboard boxes, each smaller than the one beneath it. From bottom to top, the boxes are labelled: stress, email client, mobile pings, doomscrolling, social media silos... and the very top, very smallest box, which glows with sunbeams emitted from it, reads "rss reader".× FreshRSS sidebar, showing 567 unread items (of which 1 are comics, 2 are friends, 186 are communities, 1 are distractions, 278 are geeky, 1 is "me", 57 are youtube, 13 are strangers, 1 is software, 7 are rss club, 29 are podcasts, and 3 are polyamory. A further 107 are marked as favourites. The "friends" and "rss club" categories are showing warning triangles.× Google Reader's "Congratulations, you've reached the End of the Internet." Easter Egg screen, shown when all your feeds are empty.×

Better WordPress RSS Feeds

I’ve made a handful of tweaks to my RSS feed which I feel improves upon WordPress’s default implementation, at least in my use-case.1 In case any of these improvements help you, too, here’s a list of them:

Post Kinds in Titles

Since 2020, I’ve decorated post titles by prefixing them with the “kind” of post they are (courtesy of the Post Kinds plugin). I’ve already written about how I do it, if you’re interested.

Screenshot showing a Weekly Digest email from DanQ.me, with two Notes and a Repost clearly-identified.
Identifying post kinds is particularly useful for people who subscribe by email (the emails are generated off the RSS feed either daily or weekly: subscriber’s choice), who might want to see articles and videos but not care about for example checkins and reposts.

RSS Only posts

A minority of my posts are – initially, at least – publicised only via my RSS feed (and places that are directly fed by it, like email subscribers). I use a tag to identify posts to be hidden in this way. I’ve written about my implementation before, but I’ve since made a couple of additional improvements:

  1. Suppressing the tag from tag clouds, to make it harder to accidentally discover these posts by tag-surfing,
  2. Tweaking the title of such posts when they appear in feeds (using the same technique as above), so that readers know when they’re seeing “exclusive” content, and
  3. Setting a X-Robots-Tag: noindex, nofollow HTTP header when viewing such tag or a post, to discourage search engines (code for this not shown below because it’s so very specific to my theme that it’s probably no use to anybody else!).
// 1. Suppress the "rss club" tag from tag clouds/the full tag list
function rss_club_suppress_tags_from_display( string $tag_list, string $before, string $sep, string $after, int $post_id ): string {
  foreach(['rss-club'] as $tag_to_suppress){
    $regex = sprintf( '/<li>[^<]*?<a [^>]*?href="[^"]*?\/%s\/"[^>]*?>.*?<\/a>[^<]*?<\/li>/', $tag_to_suppress );
    $tag_list = preg_replace( $regex, '', $tag_list );
  }
  return $tag_list;
}
add_filter( 'the_tags', 'rss_club_suppress_tags_from_display', 10, 5 );

// 2. In feeds, tweak title if it's an RSS exclusive
function rss_club_add_rss_only_to_rss_post_title( $title ){
  $post_tag_slugs = array_map(function($tag){ return $tag->slug; }, wp_get_post_tags( get_the_ID() ));
  if ( ! in_array( 'rss-club', $post_tag_slugs ) ) return $title; // if we don't have an rss-club tag, drop out here
  return trim( "{$title} [RSS Exclusive!]" );
  return $title;
}
add_filter( 'the_title_rss', 'rss_club_add_rss_only_to_rss_post_title', 6 );

Adding a stylesheet

Adding a stylesheet to your feeds can make them much friendlier to beginner users (which helps drive adoption) without making them much less-convenient for people who know how to use feeds already. Darek Kay and Terence Eden both wrote great articles about this just earlier this year, but I think my implementation goes a step further.

Screenshot of DanQ.me's RSS feed as viewed in Firefox, showing a "Q" logo and three recent posts.
I started with Matt Webb‘s pretty-feed-v3.xsl by Matt Webb (as popularised by AboutFeeds.com) and built from there.

In addition to adding some “Q” branding, I made tweaks to make it work seamlessly with both my RSS and Atom feeds by using two <xsl:for-each> blocks and exploiting the fact that the two standards don’t overlap in their root namespaces. Here’s my full XSLT; you need to override your feed template as Terence describes to use it, but mine can be applied to both RSS and Atom.2

I’ve still got more I’d like to do with this, for example to take advantage of the thumbnail images I attach to posts. On which note…

Thumbnail images

When I first started offering email subscription options I used Mailchimp’s RSS-to-email service, which was… okay, but not great, and I didn’t like the privacy implications that came along with it. Mailchimp support adding thumbnails to your email template from your feed, but WordPress themes don’t by-default provide the appropriate metadata to allow them to do that. So I installed Jordy Meow‘s RSS Featured Image plugin which did it for me.

<item>
        <title>[Checkin] Geohashing expedition 2023-07-27 51 -1</title>
        <link>https://danq.me/2023/07/27/geohashing-expedition-2023-07-27-51-1/</link>

        ...

        <media:content url="/_q23u/2023/07/20230727_141710-1024x576.jpg" medium="image" />
        <media:description>Dan, wearing a grey Three Rings hoodie, carrying French Bulldog Demmy, standing on a path with trees in the background.</media:description>
</item>
Media attachments for RSS feeds are perhaps most-popular for podcasts, but they’re also great for post thumbnail images.

During my little redesign earlier this year I decided to go two steps further: (1) ditching the plugin and implementing the functionality directly into my theme (it’s really not very much code!), and (2) adding not only a <media:content medium="image" url="..." /> element but also a <media:description> providing the default alt-text for that image. I don’t know if any feed readers (correctly) handle this accessibility-improving feature, but my stylesheet above will, some day!

Here’s how that’s done:

function rss_insert_namespace_for_featured_image() {
  echo "xmlns:media=\"http://search.yahoo.com/mrss/\"\n";
}

function rss_insert_featured_image( $comments ) {
  global $post;
  $image_id = get_post_thumbnail_id( $post->ID );
  if( ! $image_id ) return;
  $image = get_the_post_thumbnail_url( $post->ID, 'large' );
  $image_url = esc_url( $image );
  $image_alt = esc_html( get_post_meta( $image_id, '_wp_attachment_image_alt', true ) );
  $image_title = esc_html( get_the_title( $image_id ) );
  $image_description = empty( $image_alt ) ? $image_title : $image_alt;
  if ( !empty( $image ) ) {
    echo <<<EOF
      <media:content url="{$image_url}" medium="image" />
      <media:description>{$image_description}</media:description>
    EOF;
  }
}

add_action( 'rss2_ns', 'rss_insert_namespace_for_featured_image' );
add_action( 'rss2_item', 'rss_insert_featured_image' );

So there we have it: a little digital gardening, and four improvements to WordPress’s default feeds.

RSS may not be as hip as it once was, but little improvements can help new users find their way into this (enlightened?) way to consume the Web.

If you’re using RSS to follow my blog, great! If it’s not for you, perhaps pick your favourite alternative way to get updates, from options including email, Telegram, the Fediverse (e.g. Mastodon), and more…

Update 4 September 2023: More-recently, I’ve improved WordPress RSS feeds by preventing them from automatically converting emoji into images.

Footnotes

1 The changes apply to the Atom feed too, for anybody of such an inclination. Just assume that if I say RSS I’m including Atom, okay?

2 The experience of writing this transformation/stylesheet also gave me yet another opportunity to remember how much I hate working with XSLTs. This time around, in addition to the normal namespace issues and headscratching syntax, I had to deal with the fact that I initially tried to use a feature from XSLT version 2.0 (a 22-year-old version) only to discover that all major web browsers still only support version 1.0 (specified last millenium)!

Screenshot showing a Weekly Digest email from DanQ.me, with two Notes and a Repost clearly-identified.× Screenshot of DanQ.me's RSS feed as viewed in Firefox, showing a "Q" logo and three recent posts.×

Note #21487

Toast popup, reading: FreshRSS: new articles! There are -26 new articles to read on FreshRSS. (unread: 1148) via rss.fox.q-t-a.uk

I have minus 26 new articles in my RSS reader! Either I’m a time traveller, or there’s a wraparound bug when you neglect your unreads for long enough.

Both seem equally likely, if I’m honest.

Toast popup, reading: FreshRSS: new articles! There are -26 new articles to read on FreshRSS. (unread: 1148) via rss.fox.q-t-a.uk×

New Far Side in FreshRSS

I got some great feedback to yesterday’s post about using FreshRSS + XPath to subscribe to Forward, including helpful comments from FreshRSS developer Alexandre Alapetite and from somebody who appreciated it and my Far Side “Daily Dose” recipe and wondered if it was possible to get the new Far Side content in FreshRSS too.

Wait, there’s new Far Side content? Yup: it turns out Gary Larson’s dusted off his pen and started drawing again. That’s awesome! But the last thing I want is to have to go to the website once every few… what: days? weeks? months? He’s not syndicated any more so he’s not got a deadline to work to! If only there were some way to have my feed reader, y’know, do it for me and let me know whenever he draws something new.

Screenshot showing new content from The Far Side in my FreshRSS reader.
It turns out, there is.

Here’s my setup for getting Larson’s new funnies right where I want them:

  • Feed URL: https://www.thefarside.com/new-stuff/1
    This isn’t a valid address for any of the new stuff, but always seems to redirect to somewhere that is, so that’s nice.
  • XPath for finding news items: //div[@class="swiper-slide"]
    Turns out all the “recent” new stuff gets loaded in the HTML and then JavaScript turns it into a slider etc.; some of the CSS classes change when the JavaScript runs so I needed to View Source rather than use my browser’s inspector to find everything.
  • Item title: concat("Far Side #", descendant::button[@aria-label="Share"]/@data-shareable-item)
    Ugh. The easiest place I could find a “clean” comic ID number was in a data- attribute of the “share” button, where it’s presumably used for engagement tracking. Still, whatever works right?
  • Item content: descendant::figcaption
    When Larson captions a comic, the caption is important.
  • Item link (URL) and item unique ID: concat("https://www.thefarside.com", ./@data-path)
    The URLs work as direct links to the content, and because they’re unique, they make a reasonable unique ID too (so long as their numbering scheme is internally-consistent, this should stop a re-run of new content popping up in your feed reader if the same comic comes around again).
  • Item thumbnail: concat("https://fox.q-t-a.uk/referer-faker.php?pw=YOUR-SECRET-PASSWORD-GOES-HERE&referer=https://www.thefarside.com/&url=", descendant::img[@data-src]/@data-src)
    The Far Side uses Referer: headers as an anti-hotlinking measure, which prevents us easily loading the images directly in an RSS reader. I use this tiny PHP script as a proxy to mitigate that. If you don’t have such a proxy set up, you could simply omit the “Item thumbnail” and “Item content” fields and click the link to go to the original page.
  • Item date: normalize-space(descendant::div[@class="tfs-comic-new__meta"]/*[1])
    The date is spread through two separate text nodes, so we get the content of their wrapper and use normalize-space to tidy the whitespace up. The date format then looks like “Wednesday, March 29, 2023”, which we can parse using a custom date/time format string:
  • Custom date/time format: l, F j, Y

I promise I’ll stop writing about how awesome FreshRSS + XPath is someday. Today isn’t that day.

Meanwhile: if you used to use a feed reader but gave up when the Web started to become hostile to them and big social media systems started to wall you in, you should really consider picking one up again. The stuff I write about is complex edge-cases that most folks don’t need to think about in order to benefit from RSS… but it’s super convenient to have the things you care about online (news, blogs, social media, videos, newsletters, comics, search trends…) collated and sorted for you… without interference from algorithms that want to push “sticky” content, without invasive tracking or advertisements (or cookie banners or privacy popups), without something “disappearing” simply because you put off reading it for a few days.

Screenshot showing new content from The Far Side in my FreshRSS reader.×

Subscribing to Forward using FreshRSS’s XPath Scraping

As I’ve mentioned before, I’m a fan of Tailsteak‘s Forward comic. I’m not a fan of the author’s weird aversion to RSS, so I hacked a way around it first using an exploit in webcomic reader app Comic Chameleon (accidentally getting access to comics weeks in advance of their publication as a side-effect) and later by using my own tool RSSey.

But now I’m able to use my favourite feed reader FreshRSS to scrape websites directly – like I’ve done for The Far Side – I should switch to using this approach to subscribe to Forward, too:

Screenshot showing RSS feed items: recent Forward episodes including their numbers, titles, and publication dates.
The goal: date-ordered, numbered, titled episodes of Forward in my feed reader.

Here’s the settings I came up with –

  • Feed URL: http://forwardcomic.com/list.php
  • Type of feed source: HTML + XPath (Web scraping)
  • XPath for finding news items: //a[starts-with(@href,'archive.php')]
  • Item title: .
  • Item link (URL): ./@href
  • Item date: ./following-sibling::text()[1]
  • Custom date/time format: - Y.m.d
Annotated screenshot showing how each XPath directive maps to each part of the page. The item selector finds each hyperlink that begins with "archive.php" (notably missing the most-recent comic at any given time, which is found at index.php), and the date is found in the text node that immediately follows it, in a slightly-unusual variation on ISO8601.
The comic pages themselves do a great thing for accessibility by including a complete transcript of each. But the listing page, which is basically a series of <a>s separated by <br>s rather than a <ul> and <li>s, for example, leaves something to be desired (and makes it harder to scrape, too!).

I continue to love this “killer feature” of FreshRSS, but I’m beginning to see how it could go further – I wish I had the free time to contribute to its development!

I’d love to see a mechanism for exporting/importing feed configurations like this so that I could share them more-easily, for example. I’d also be delighted if I could expand on my XPath rules to load pages referenced by the results and get data from them, too, e.g. so I could use an image found by XPath on the “item link” page as the thumbnail image! These are things RSSey could do for me, but FreshRSS can’t… yet!

Screenshot showing RSS feed items: recent Forward episodes including their numbers, titles, and publication dates.× Annotated screenshot showing how each XPath directive maps to each part of the page. The item selector finds each hyperlink that begins with "archive.php" (notably missing the most-recent comic at any given time, which is found at index.php), and the date is found in the text node that immediately follows it, in a slightly-unusual variation on ISO8601.×

Satoru Iwata’s first commercial game has a secret

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Codex (YouTube)

This was a delightful vlog. It really adds personality to what might otherwise have been a story only about technology and history.

I subscribed to Codex’s vlog like… four years ago? He went dark soon afterwards, but thanks to the magic of RSS, I got notified as soon as he came back from his hiatus.

Email newsletters via RSS

I love feeds!

Maybe you’ve heard already, but I love RSS.

I love it so much that I retrofit sites without feeds into it for the convenience of my favourite reader FreshRSS: working around (for example) the lack of feeds in The Far Side (twice), in friends’ blogs, and in my URL shortener. Whether tracking my progress binging webcomic history, subscribing to YouTube channels, or filtering-out sports news, feeds are the centre of my digital life.

Illustration showing a web application with an RSS feed; the RSS feed is sending data to my RSS reader (represented by FreshRSS's icon).

 

There’s been a bit of a resurgence lately of sites whose only subscription option is email, or – worse yet – who provide certain “exclusive” content only to email subscribers.

I don’t want to go giving an actual email address to every damn service, because:

  • It’s not great for privacy, even when (as usual) I use a unique alias for each sender.
  • It’s usually harder to unsubscribe than I’d like, and rarely consistent: you need to find a recent message, click a link, sometimes that’s enough or sometimes you need to uncheck a box or click a button, or sometimes you’ll get another email with something to click in it…
  • I rarely want to be notified the very second a new issue is published; email is necessarily more “pushy” than I like a subscription to be.
  • I don’t want to use my email Inbox to keep track of which articles I’ve read/am still going to read: that’s what a feed reader is for! (It also provides tagging, bookmarking, filtering, standardised and bulk unsubscribing tools, etc.)

So what do I do? Well…

Illustration showing a web application using MailChimp to send an email newsletter to OpenTrashMail, to which FreshRSS is subscribed.

I already operate an OpenTrashMail instance for one-shot throwaway email addresses (which I highly recommend). And OpenTrashMail provides a rich RSS feed. Sooo…

How I subscribe to newsletters (in my feed reader)

If I want to subscribe to your newsletter, here’s what I do:

  1. Put an email address (I usually just bash the keyboard to make a random one, then put @-a-domain-I-control on the end, where that domain is handled by OpenTrashMail) in to subscribe.
  2. Put https://my-opentrashmail-server/rss/the-email-address-I-gave-you/rss.xml into my feed reader.
  3. That’s all. There is no step 3.

Now I get your newsletter alongside all my other subscriptions. If I want to unsubscribe I just tell my feed reader to stop polling the RSS feed (You don’t even get to find out that I’ve unsubscribed; you’re now just dropping emails into an unmonitored box, but of course I can resubscribe and pick up from where I left off if I ever want to).

Obviously this approach isn’t suitable for personalised content or sites for which your email address is used for authentication, because anybody who can guess the random email address can get the feed! But it’s ideal for those companies who’ll ocassionally provide vouchers in exchange for being able to send you other stuff to your Inbox, because you can simply pipe their content to your feed reader, then add a filter to drop anything that doesn’t contain the magic keyword: regular vouchers, none of the spam. Or for blogs that provide bonus content to email subscribers, you can get the bonus content in the same way as the regular content, right there in a folder of your reader. It’s pretty awesome.

If you don’t already have and wouldn’t benefit from running OpenTrashMail (or another trashmail system with feed support) it’s probably not worth setting one up just for this purpose. But otherwise, I can certainly recommend it.

The Far Side in FreshRSS

A few yeras ago, I wanted to subscribe to The Far Side‘s “Daily Dose” via my RSS reader. The Far Side doesn’t have an RSS feed, so I implemented a proxy/middleware to bridge the two.

Browser debugger running document.evaluate('//li[@class="blog__post-preview"]', document).iterateNext() on Beverley's weblog and getting the first blog entry.
If you’re looking for a more-general instruction on using XPath scraping in FreshRSS, this isn’t it.
The release of version 1.20.0 of my favourite RSS reader FreshRSS provided a new mechanism for subscribing to content from sites that didn’t provide feeds: XPath scraping. I demonstrated the use of this to subscribe to my friend Beverley‘s blog, but this week I figured it was time to have a go at retiring my middleware and subscribing directly to The Far Side from FreshRSS.

It turns out that FreshRSS’s XPath Scraping is almost enough to achieve exactly what I want. The big problem is that the image server on The Far Side website tries to prevent hotlinking by checking the Referer: header on requests, so we need a proxy to spoof that. I threw together a quick PHP program to act as a proxy (if you don’t have this, you’ll have to click-through to read each comic), then configured my FreshRSS feed as follows:

FreshRSS "HTML + XPath" configuration page, configured as described below.

  • Feed URL: https://www.thefarside.com/
    The “Daily Dose” gets published to The Far Side‘s homepage each day.
  • XPath for finding new items: //div[@class="card tfs-comic js-comic"]
    Finds each comic on the page. This is probably a little over-specific and brittle; I should probably switch to using the contains function at some point. I subsequently have to use parent:: and ancestor:: selectors which is usually a sign that your screen-scraping is suboptimal, but in this case it’s necessary because it’s only at this deep level that we start seeing really specific classes.
  • Item title: concat("Far Side #", parent::div/@data-id)
    The comics don’t have titles (“The one with the cow”?), but these seem to have unique IDs in the data-id attribute of the parent <div>, so I’m using those as a reference.
  • Item content: descendant::div[@class="card-body"]
    Within each item, the <div class="card-body"> contains the comic and its text. The comic itself can’t be loaded this way for two reasons: (1) the <img src="..."> just points to a placeholder (the site uses JavaScript-powered lazy-loading, ugh – the actual source is in the data-src attribute), and (2) as mentioned above, there’s anti-hotlink protection we need to work around.
  • Item link: descendant::input[@data-copy-item]/@value
    Each comic does have a unique link which you can access by clicking the “share” button under it. This makes a hidden text <input> appear, which we can identify by the presence of the data-copy-item attribute. The contents of this textbox is the sharing URL for the comic.
  • Item thumbnail: concat("https://example.com/referer-faker.php?pw=YOUR-SECRET-PASSWORD-GOES-HERE&referer=https://www.thefarside.com/&url=", descendant::div[@class="tfs-comic__image"]/img/@data-src)
    Here’s where I hook into my special proxy server, which spoofs the Referer: header to work around the anti-hotlinking code. If you wanted you might be able to come up with an alternative solution using a custom JavaScript loaded into your FreshRSS instance (there’s a plugin for that!), perhaps to load an iframe of the sharing URL? Or you can host a copy of my proxy server yourself (you can’t use mine, it’s got a password and that password isn’t YOUR-SECRET-PASSWORD-GOES-HERE!)
  • Item date: ancestor::div[@class="tfs-page__full tfs-page__full--md"]/descendant::h3
    There’s nothing associating each comic with the date it appeared in the Daily Dose, so we have to ascend up to the top level of the page to find the date from the heading.
  • Item unique ID: parent::div/@data-id
    Giving FreshRSS a unique ID can help it stop showing duplicates. We use the unique ID we discovered earlier; this way, if the Daily Dose does a re-run of something it already did since I subscribed, I won’t be shown it again. Omit this if you want to see reruns.
Far Side comic #12326, from 23 November 2022, shown in FreshRSS. The comic shows two bulls dressed in trenchcoats and hats browsing a china shop; one staff member says to the other "I got a bad feeling about this, Harriet."
Hurrah; once again I can laugh at repeats of Gary Larson’s best work alongside my other morning feeds.

There’s a moral to this story: when you make your website deliberately hard to consume, fewer people will access it in the way you want! The Far Side‘s website is actively hostile to users (JavaScript lazy-loading, anti-right click scripts, hotlink protection, incorrect MIME types, no feeds etc.), and an inevitable consequence of that is that people like me will find and share workarounds to that hostility.

If you’re ad-supported or collect webstats and want to keep traffic “on your site” on this side of 2004, you should make it as easy as possible for people to subscribe to content. Consider The Oatmeal or Oglaf, for example, which offer RSS feeds that include only a partial thumbnail of each comic and a link through to the full thing. I don’t feel the need to screen-scrape those sites because they’ve given me a subscription option that works, and I routinely click-through to both of them to enjoy their latest content!

Conversely, the Far Side‘s aggressive anti-subscription technology ultimately means that there are fewer actual visitors to their website… because folks like me work to circumvent them.

And now you know how I did so.

Update: want the new content that’s being published to The Far Side in FreshRSS, too? I’ve got a recipe for that!

FreshRSS "HTML + XPath" configuration page, configured as described below.×