Pushing RSS to WhatsApp (for free?)

I’m always keen to experiment with new ways to subscribe to my blogObviously RSS is the best and everybody who can use it should. But some people, for their own reasons1, prefer to learn about updates to their favourite sites via the Fediverse, or Facebook, or Telegram, or… I don’t know… LiveJournal or something (yes, those are all places you can follow this blog, if you really wanted to).

But I’m pretty sure there are some people who’d rather receive updates to my blog via WhatsApp. And now, they can. Here’s how I set up an RSS-to-WhatsApp gateway, in case you want to run one of your own2.

Diagram showing how GitHub Actions request and receive an RSS feed, push an update to Whapi, which pushes the update to WhatsApp.

Prerequisites

You will need:

  1. A GitHub account – a free one is fine
  2. A Whapi account connected to your WhatsApp account3 – when you set up an account you’ll get a free trial; when it ends you need to find the link to say that you want to carry on with the free tier (or upgrade to the paid tier if you expect to send more messages than the free tier’s limit)
  3. A WhatsApp channel to which you want to push your RSS feed: I’d recommend that you make a newsletter (from the Updates tab in WhatsApp, press the kekab menu then Create Channel) rather than a traditional group: groups are designed for multiple people to talk and discuss and everybody can see one another’s identity, but a newsletter keeps everybody’s identity private and only allows the administrator(s) permission to post updates.
A WhatsApp 'Subscription Channel'/'Newsletter' group for DanQ.me, showing some recent posts, on a mobile screen.
You probably want to use the kind of channel that’s for one-to-many ‘push’ communication, not a discussion group.

Instructions

  1. Fork the Dan-Q/rss-to-whapi.cloud repository into your own GitHub account.
  2. In Settings > Secrets and Variables > Actions, add two new Repository Secrets:
    • WHATSAPP_API_TOKEN: set to the token on your Whapi dashboard
    • WHATSAPP_CHANNEL: set to your newsletter ID (will look like 123456789012345678@newsletter) or group ID (will look like 123456789012345678@g.us): you can get this from the Newsletters or Groups section of Whapi by executing a test GET /newsletters or GET /groups request4.
  3. Make a feeds.json file (a feeds.json.example is provided as a guide) containing the URLs of the RSS feeds you’d like to subscribe to.
  4. Do a test run: from the Actions tab select the “Process feeds” action and click “Run workflow”. If it finishes successfully (and you get the WhatsApp message), you’re done! If it fails, click on the failed action and drill-in to the failed task to see the error message and correct accordingly.

By default, the processor will run on-demand and every 30 minutes, but you can modify that in .github/workflows/process-feeds.yml. It’s configured to send the single oldest un-sent item in any of the RSS feeds it’s subscribed to, on each run (it tracks which ones it’s sent already by their guids, in a "seen": [...] array in feeds.json): sending a single link per run ensures that WhatsApp’s link previews work as expected. At that rate, you could theoretically run it once every 10 minutes and never hit the 150-messages-per-day limit of Whapi’s free tier5), but you’ll want to work out your own optimal rate based on the anticipated update frequency of your feeds and the number of RSS-to-WhatsApp channels you’re running.

You can, of course, run it on your own infrastructure in a similar way. Just check out the repository to your local system with Ruby 3.2+ running, run bundle to install the dependencies, then set up a cron job or some other automation to run ./process_feeds.rb. Doing this could be used to hook it up to your RSS feed updating pipeline, for example, to check for new feed items right after a new post is published.

Footnotes

1 Their own incomprehensible, illogical, weird reasons.

2 I hope that the title gives it away, but you can do this completely for free. So long as you keep your fork of the GitHub repository open-source then you can run GitHub Actions for free, and so long as you’re pushing out no more than 150 updates per day to no more than 5 different channels in a month then you can do it within Whapi’s free tier: that’s probably fine for a personal blogger, and there’s a reasonable pricing structure (plus some value-added extras) for companies that want to use this same workflow as part of a grander WhatsApp offering.

3 Setting this up requires giving Whapi access to your WhatsApp account. If you don’t like the security implications of that, you could get a cheap eSIM, set that up with WhatsApp, and use that account: if you do this, just remember to “warm up” your new WhatsApp account with some conversations with yourself so it doesn’t look so much like a spammer! Also note that the way Whapi works “uses up” one of the ~4 devices on which you can simultaneously use WhatsApp Web/WhatsApp Desktop etc.

4 Prefer the command-line? So long as you’ve got curl and jq then you can get a list of your newsletters (or groups) and their IDs with curl -H 'Authorization: Bearer YOUR_API_TOKEN' -H 'accept: application/json' https://gate.whapi.cloud/newsletters?count=100 | jq '.newsletters[] | { id: .id, name: .name }' or curl -H 'Authorization: Bearer YOUR_API_TOKEN' -H 'accept: application/json' https://gate.whapi.cloud/groups?count=100 | jq '.groups[] | { id: .id, name: .name }', respectively.

5 Going beyond the free tier would require sending one message, on average, every 9 minutes and 36 seconds.

I Wrote the Same Code Six Times!

I was updating my CV earlier this week in anticipation of applying for a handful of interesting-looking roles1 and I was considering quite how many different tech stacks I claim significant experience in, nowadays.

There are languages I’ve been writing in every single week for the last 15+ years, of course, like PHP, Ruby, and JavaScript. And my underlying fundamentals are solid.

But is it really fair for me to be able to claim that I can code in Java, Go, or Python: languages that I’ve not used commercially within the last 5-10 years?

Animated GIF showing Sublime Text editor flicking through six different languages, with equivalent code showing in each.
What kind of developer writes the same program six times… for a tech test they haven’t even been asked to do? If you guessed “Dan”, you’d be correct!

Obviously, I couldn’t just let that question lie2. Let’s find out!

Contents

The Test

I fished around on Glassdoor for a bit to find a medium-sized single-sitting tech test, and found a couple of different briefs that I mashed together to create this:

In an object-oriented manner, implement an LRU (Least-Recently Used) cache:

  • The size of the cache is specified at instantiation.
  • Arbitrary objects can be put into the cache, along with a retrieval key in the form of a string. Using the same string, you can get the objects back.
  • If a put operation would increase the number of objects in the cache beyond the size limit, the cached object that was least-recently accessed (by either a put or get operation) is removed to make room for it.
  • putting a duplicate key into the cache should update the associated object (and make this item most-recently accessed).
  • Both the get and put operations should resolve within constant (O(1)) time.
  • Add automated tests to support the functionality.

My plan was to implement a solution to this challenge, in as many of the languages mentioned on my CV as possible in a single sitting.

But first, a little Data Structures & Algorithms theory:

The Theory

Simple case with O(n) complexity

The simplest way to implement such a cache might be as follows:

  • Use a linear data structure like an array or linked list to store cached items.
  • On get, iterate through the list to try to find the matching item.
    • If found: move it to the head of the list, then return it.
  • On put, first check if it already exists in the list as with get:
    • If it already exists, update it and move it to the head of the list.
    • Otherwise, insert it as a new item at the head of the list.
      • If this would increase the size of the list beyond the permitted limit, pop and discard the item at the tail of the list.
Illustration of a simple Cache class with a linked list data structure containing three items, referenced from the class by its head item. It has 'get' and 'put' methods.
It’s simple, elegant and totally the kind of thing I’d accept if I were recruiting for a junior or graduate developer. But we can do better.

The problem with this approach is that it fails the requirement that the methods “should resolve within constant (O(1)) time”3.

Of particular concern is the fact that any operation which might need to re-sort the list to put the just-accessed item at the top 4. Let’s try another design:

Achieving O(1) time complexity

Here’s another way to implement the cache:

  • Retain cache items in a doubly-linked list, with a pointer to both the head and tail
  • Add a hash map (or similar language-specific structure) for fast lookups by cache key
  • On get, check the hash map to see if the item exists.
    • If so, return it and promote it to the head (as described below).
  • On put, check the hash map to see if the item exists.
    • If so, promote it to the head (as described below).
    • If not, insert it at the head by:
      • Updating the prev of the current head item and then pointing the head to the new item (which will have the old head item as its next), and
      • Adding it to the hash map.
      • If the number of items in the hash map would exceed the limit, remove the tail item from the hash map, point the tail at the tail item’s prev, and unlink the expired tail item from the new tail item’s next.
  • To promote an item to the head of the list:
    1. Follow the item’s prev and next to find its siblings and link them to one another (removes the item from the list).
    2. Point the promoted item’s next to the current head, and the current head‘s prev to the promoted item.
    3. Point the head of the list at the promoted item.
Illustration of a more-sophisticated Cache class with a doubly-linked list data structure containing three items, referenced from the class by its head and tail item. It has 'get' and 'put' methods. It also has a separate hash map indexing each cached item by its key.
Looking at a plate of pointer-spaghetti makes me strangely hungry.

It’s important to realise that this alternative implementation isn’t better. It’s just different: the “right” solution depends on the use-case5.

The Implementation

That’s enough analysis and design. Time to write some code.

GitHub repo 'Languages' panel, showing a project that includes Java (17.2%), Go (16.8%), Python (16.4%), Ruby (15.7%), TypeScript (13.7%), PHP (11.5%), and 'Other' (8.7%).
Turns out that if you use enough different languages in your project, GitHub begins to look like itwants to draw a rainbow.

Picking a handful of the more-useful languages on my CV6, I opted to implement in:

  • Ruby (with RSpec for testing and Rubocop for linting)
  • PHP (with PHPUnit for testing)
  • TypeScript (running on Node, with Jest for testing)
  • Java (with JUnit for testing)
  • Go (which isn’t really an object-oriented language but acts a bit like one, amirite?)
  • Python (probably my weakest language in this set, but which actually ended up with quite a tidy solution)

Naturally, I open-sourced everything if you’d like to see for yourself. It all works, although if you’re actually in need of such a cache for your project you’ll probably find an alternative that’s at least as good (and more-likely to be maintained!) in a third-party library somewhere!

What did I learn?

This was actually pretty fun! I might continue to expand my repo by doing the same challenge with a few of the other languages I’ve used professionally at some point or another7.

And there’s a few takeaways I got from this experience –

Lesson #1: programming more languages can make you better at all of them

As I went along, one language at a time, I ended up realising improvements that I could make to earlier iterations.

For example, when I came to the TypeScript implementation, I decided to use generics so that the developer can specify what kind of objects they want to store in the cache, rather than just a generic Object, and better benefit type-safety. That’s when I remembered that Java supports generics, too, so I went back and used them there as well.

In the same way as speaking multiple (human) languages or studying linguistics can help unlock new ways of thinking about your communication, being able to think in terms of multiple different programming languages helps you spot new opportunities. When in 2020 PHP 8 added nullsafe operators, union types, and named arguments, I remember feeling confident using them from day one because those features were already familiar to me from Ruby8, TypeScript9, and Python10, respectively.

Lesson #2: even when I’m rusty, I can rely on my fundamentals

I’ve applied for a handful of jobs now, but if one of them had invited me to a pairing session on a language I’m rusty on (like Java!) I might’ve felt intimidated.

But it turns out I shouldn’t need to be! With my solid fundamentals and a handful of other languages under my belt, I understand when I need to step away from the code editor and hit the API documentation. Turns out, I’m in a good position to demo any of my language skills.

I remember when I was first learning Go, I wanted to make use of a particular language feature that I didn’t know whether it had. But because I’d used that feature in Ruby, I knew what to search for in Go’s documentation to see if it was supported (it wasn’t) and if so, what the syntax was11.

Lesson #3: structural rules are harder to gearshift than syntactic ones

Switching between six different languages while writing the same application was occasionally challenging, but not in the ways I expected.

I’ve had plenty of experience switching programming languages mid-train-of-thought before. Sometimes you just have to flit between the frontend and backend of your application!

But this time around I discovered: changes in structure are apparently harder for my brain than changes in syntax. E.g.:

  • Switching in and out of Python’s indentation caught me out at least once (might’ve been better if I took the time to install the language’s tools into my text editor first!).
  • Switching from a language without enforced semicolon line ends (e.g. Ruby, Go) to one with them (e.g. Java, PHP) had me make the compiler sad several times.
  • This gets even tougher when not writing the language but writing about the language: my first pass at the documentation for the Go version somehow ended up with Ruby/Python-style #-comments instead of Go/Java/TypeScript-style //-comments; whoops!

I’m guessing that the part of my memory that looks after a language’s keywords, how a method header is structured, and which equals sign to use for assignment versus comparison… are stored in a different part of my brain than the bit that keeps track of how a language is laid-out?12

Okay, time for a new job

I reckon it’s time I got back into work, so I’m going to have a look around and see if there’s any roles out there that look exciting to me.

If you know anybody who’s looking for a UK-based, remote-first, senior+, full-stack web developer with 25+ years experience and more languages than you can shake a stick at… point them at my CV, would you?

Footnotes

1 I suspect that when most software engineers look for a new job, they filter to the languages, frameworks, they feel they’re strongest at. I do a little of that, I suppose, but I’m far more-motivated by culture, sector, product and environment than I am by the shape of your stack, and I’m versatile enough that technology specifics can almost come second. So long as you’re not asking me to write VB.NET.

2 It’s sort-of a parallel to how I decided to check the other week that my Gutenberg experience was sufficiently strong that I could write standard ReactJS, too.

3 I was pleased to find a tech test that actually called for an understanding of algorithm growth/scaling rates, so I could steal this requirement for my own experiment! I fear that sometimes, in their drive to be pragmatic and representative of “real work”, the value of a comprehension of computer science fundamentals is overlooked by recruiters.

4 Even if an algorithm takes the approach of creating a new list with the inserted/modified item at the top, that’s still just a very-specific case of insertion sort when you think about it, right?

5 The second design will be slower at writing but faster at reading, and will scale better as the cache gets larger. That sounds great for a read-often/write-rarely cache, but your situation may differ.

6 Okay, my language selection was pretty arbitrary. But if I’d have also come up with implementations in Perl, and C#, and Elixir, and whatever else… I’d have been writing code all day!

7 So long as I’m willing to be flexible about the “object-oriented” requirement, there are even more options available to me. Probably the language that I last wrote longest ago would be Pascal: I wonder how much of that I remember?

8 Ruby’s safe navigation/”lonely” operator did the same thing as PHP’s nullsafe operator since 2015.

9 TypeScript got union types back in 2015, and apart from them being more-strictly-enforced they’re basically identical to PHP’s.

10 Did you know that Python had keyword arguments since its very first public release way back in 1994! How did it take so many other interpreted languages so long to catch up?

11 The feature was the three-way comparison or “spaceship operator”, in case you were wondering.

12 I wonder if anybody’s ever laid a programmer in an MRI machine while they code? I’d be really interested to see if different bits of the brain light up when coding in functional programming languages than in procedural ones, for example!

× × × ×

BBC News RSS… your way!

It turns out my series of efforts to improve the BBC News RSS feeds are more-popular than I thought. People keep asking for variants of them, and it’s probably time I stopped hosting the resulting feeds on my NAS (which does a good job, but it’s in a highly-kickable place right under my desk).

Screenshot of BBC News RSS Feeds (that don't suck!).
The new site isn’t pretty. But it works.

So I’ve launched BBC-Feeds.DanQ.dev. On a 20-minute schedule, it generates both UK and World editions of the BBC News feeds, filtered to remove iPlayer, Sounds, app “nudges”, duplicates, and other junk, and optionally with the sports news filtered out too.

The entire thing is open source under an ultra-permissive license, so you can run your own copy if you don’t want to use mine.

Enjoy!

BBC News RSS… with the sport?

There’s now a much, much better version of this. Go use that instead: bbc-feeds.danq.dev.

Earlier today, somebody called Allan commented on the latest in my series of several blog posts about how I mutilate manipulate the RSS feeds of BBC News to work around their (many, and increasingly so) various shortcomings, specifically:

  1. Their inclusion of non-news content such as plugs for iPlayer and their apps,
  2. Their repeating of identical news stories with marginally-different GUIDs, and
  3. All of the sports news, which I don’t care about one jot.

Well, it turns out that some people want #3: the sport. But still don’t want the other two.

FreshRSS screenshot with many unread items, but focussing on a feed called "BBC News (with sport)" and showing a story titled: 'How England Golf's yellow cards are tackling blight of slow play'
Some people actually want to read this crap, apparently.

I shan’t be subscribing to this RSS feed, and I can’t promise I’ll fix it if it gets broken. But if “without the crap, but with the sports” is the way you like your BBC News RSS feed, I’ve got you covered:

So there you go, Allan, and anybody in a similar position. I hope that fulfils your need for sports news… without the crap.

×

BBC News… without the crap

Update: nowadays, the best place to get this feed and more like it is at bbc-feeds.danq.dev.

Did I mention recently that I love RSS? That it brings me great joy? That I start and finish almost every day in my feed reader? Probably.

I used to have a single minor niggle with the BBC News RSS feed: that it included sports news, which I didn’t care about. So I wrote a script that downloaded it, stripped sports news, and re-exported the feed for me to subscribe to. Magic.

RSS reader showing duplicate copies of the news story "Barbie 2? 'We'd love to,' says Warner Bros boss", and an entry from BBC Sounds.
Lately my BBC News feed has caused me some annoyance and frustration.

But lately – presumably as a result of technical changes at the Beeb’s side – this feed has found two fresh ways to annoy me:

  1. The feed now re-publishes a story if it gets re-promoted to the front pagebut with a different <guid> (it appears to get a #0 after it when first published, a #1 the second time, and so on). In a typical day the feed reader might scoop up new stories about once an hour, any by the time I get to reading them the same exact story might appear in my reader multiple times. Ugh.
  2. They’ve started adding iPlayer and BBC Sounds content to the BBC News feed. I don’t follow BBC News in my feed reader because I want to watch or listen to things. If you do, that’s fine, but I don’t, and I’d rather filter this content out.

Luckily, I already have a recipe for improving this feed, thanks to my prior work. Let’s look at my newly-revised script (also available on GitHub):

#!/usr/bin/env ruby
require 'bundler/inline'
# # Sample crontab:
# # At 41 minutes past each hour, run the script and log the results
# */20 * * * * ~/bbc-news-rss-filter-sport-out.rb > ~/bbc-news-rss-filter-sport-out.log 2>>&1
# Dependencies:
# * open-uri - load remote URL content easily
# * nokogiri - parse/filter XML
gemfile do
  source 'https://rubygems.org'
  gem 'nokogiri'
end
require 'open-uri'
# Regular expression describing the GUIDs to reject from the resulting RSS feed
# We want to drop everything from the "sport" section of the website, also any iPlayer/Sounds links
REJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/(sport|iplayer|sounds)\//
# Load and filter the original RSS
rss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
rss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING }.each(&:unlink)

# Strip the anchors off the <guid>s: BBC News "republishes" stories by using guids with #0, #1, #2 etc, which results in duplicates in feed readers
rss.css('guid').each{|g|g.content=g.content.gsub(/#.*$/,'')}

File.open( '/www/bbc-news-no-sport.xml', 'w' ){ |f| f.puts(rss.to_s) }
It’s amazing what you can do with Nokogiri and a half dozen lines of Ruby.

That revised script removes from the feed anything whose <guid> suggests it’s sports news or from BBC Sounds or iPlayer, and also strips any “anchor” part of the <guid> before re-exporting the feed. Much better. (Strictly speaking, this can result in a technically-invalid feed by introducing duplicates, but your feed reader oughta be smart enough to compensate for and ignore that: mine certainly is!)

You’re free to take and adapt the script to your own needs, or – if you don’t mind being tied to my opinions about what should be in BBC News’ RSS feed – just subscribe to my copy at: https://fox.q-t-a.uk/bbc-news-no-sport.xml

Update: nowadays, the best place to get this feed and more like it is at bbc-feeds.danq.dev.

×

UK Strikes in .ics Format

My work colleague Simon was looking for a way to add all of the upcoming UK strike action to their calendar, presumably so they know when not to try to catch a bus or require an ambulance or maybe just so they’d know to whom they should be giving support on any particular day. Thom was able to suggest a few places to see lists of strikes, such as this BBC News page and the comprehensive strikecalendar.co.uk, but neither provided a handy machine-readable feed.

Screenshot showing a Thunderbird calendar popularted with strikes on every day in February.
Gosh, there’s a lot of strikes going on. ✊

If only they knew somebody who loves an excuse to throw a screen-scraper together. Oh wait, that’s me!

I threw together a 36-line Ruby program that extracts all the data from strikecalendar.co.uk and outputs an .ics file. I guess if you wanted you could set it up to automatically update the file a couple of times a day and host it at a URL that people can subscribe to; that’s an exercise left for the reader.

If you just want a one-off import based on the state-of-play right now, though, you can save this .ics file to your computer and import it to your calendar. Simple.

×

Nginx Caching for Passenger Applications

Suppose you’re running an application on a Passenger + Nginx powered server and you want to add caching.

Perhaps your application has a dynamic, public endpoint but the contents don’t change super-frequently or it isn’t critically-important that the user always gets up-to-the-second accuracy, and you’d like to improve performance with microcaching. How would you do that?

Where you’re at

Diagram showing the Internet connecting to an Nginx+Passenger webserver, connecting to an application written for Ruby, Python, or NodeJS.
Not pictured: the rest of the Internet.

Your configuration might look something like this:

1
2
3
4
5
6
7
server {
  # listen, server_name, ssl, logging etc. directives go here
  # ...

  root               /your/application;
  passenger_enabled  on;
}

What you’re looking for is proxy_cache and its sister directives, but you can’t just insert them here because while Passenger does act act like an upstream proxy (with parallels like e.g. passenger_pass_header which mirrors the behaviour of proxy_pass_header), it doesn’t provide any of the functions you need to implement proxy caching of non-static files.

Where you need to be

Instead, what you need to to is define a second server, mount Passenger in that, and then proxy to that second server. E.g.:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Set up a cache
proxy_cache_path /tmp/cache/my-app-cache keys_zone=MyAppCache:10m levels=1:2 inactive=600s max_size=100m;

# Define the actual webserver that listens for Internet traffic:
server {
  # listen, server_name, ssl, logging etc. directives go here
  # ...

  # You can configure different rules by location etc., but here's a simple microcache:
  location / {
    proxy_pass http://127.0.0.1:4863; # Proxy all traffic to the application server defined below
    proxy_cache           MyAppCache; # Use the cache defined above
    proxy_cache_valid     200 3s;     # Treat HTTP 200 responses as valid; cache them for 3 seconds
    proxy_cache_use_stale updating;   # (Optional) send outdated response while background-updating cache
    proxy_cache_lock      on;         # (Optional) only allow one process to update cache at once
  }
}

# (Local-only) application server on an arbitrary port number to act as the upstream proxy:
server {
  listen 127.0.0.1:4863;

  root               /your/application;
  passenger_enabled  on;
}

The two key changes are:

  • Passenger moves to a second server block, localhost-only, on an arbitrary port number (doesn’t need HTTPS, of course, but if your application detects/”expects” HTTPS you might need to tweak your headers).
  • Your main server block proxies to the second as its upstream, and you can add whatever caching directives you like.

Obviously you’ll need to be smarter if you host a mixture of public and private content (e.g. send Vary: headers from your application) and if you want different cache durations on different addresses or types of content, but there are already great guides to help with that. I only wrote this post because I spent some time searching for (nonexistent!) passenger_cache_ etc. rules and wanted to save the next person from the same trouble!

×

LABS Comic RSS Archive

Yesterday I recommended that you go read Aaron Uglum‘s webcomic LABS which had just completed its final strip. I’m a big fan of “completed” webcomics – they feel binge-able in the same way as a complete Netflix series does! – but Spencer quickly pointed out that it’s annoying for we enlightened modern RSS users who hook RSS up to everything to have to binge completed comics in a different way to reading ongoing ones: what he wanted was an RSS feed covering the entire history of LABS.

LABS comic adapted to show The Robot literally "feeding" RSS
With apologies to Aaron Uglum who I hope won’t mind me adapting his comic in this way.

So naturally (after the intense heatwave woke me early this morning anyway) I made one: complete RSS feed of LABS. And, of course, I open-sourced the code I used to generate it so that others can jumpstart their projects to make static RSS feeds from completed webcomics, too.

Even if you’re not going to read it via this medium, you should go read LABS.

BBC News… without the sport

There’s now a much, much better version of this. Go use that instead: bbc-feeds.danq.dev.

I love RSS, but it’s a minor niggle for me that if I subscribe to any of the BBC News RSS feeds I invariably get all the sports news, too. Which’d be fine if I gave even the slightest care about the world of sports, but I don’t.

Sports on the BBC News site
Down with Things Like This!

It only takes a couple of seconds to skim past the sports stories that clog up my feed reader, but because I like to scratch my own itches, I came up with a solution. It’s more-heavyweight perhaps than it needs to be, but it does the job. If you’re just looking for a BBC News (UK) feed but with sports filtered out you’re welcome to share mine: https://f001.backblazeb2.com/file/Dan–Q–Public/bbc-news-nosport.rss https://fox.q-t-a.uk/bbc-news-no-sport.xml.

If you’d like to see how I did it so you can host it yourself or adapt it for some similar purpose, the code’s below or on GitHub:

#!/usr/bin/env ruby
# # Sample crontab:
# # At 41 minutes past each hour, run the script and log the results
# 41 * * * * ~/bbc-news-rss-filter-sport-out.rb > ~/bbc-news-rss-filter-sport-out.log 2>&1
# Dependencies:
# * open-uri - load remote URL content easily
# * nokogiri - parse/filter XML
# * b2       - command line tools, described below
require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'nokogiri'
end
require 'open-uri'
# Regular expression describing the GUIDs to reject from the resulting RSS feed
# We want to drop everything from the "sport" section of the website
REJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/sport\//
# Assumption: you're set up with a Backblaze B2 account with a bucket to which
# you'd like to upload the resulting RSS file, and you've configured the 'b2'
# command-line tool (https://www.backblaze.com/b2/docs/b2_authorize_account.html)
B2_BUCKET = 'YOUR-BUCKET-NAME-GOES-HERE'
B2_FILENAME = 'bbc-news-nosport.rss'
# Load and filter the original RSS
rss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
rss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING }.each(&:unlink)
begin
  # Output resulting filtered RSS into a temporary file
  temp_file = Tempfile.new
  temp_file.write(rss.to_s)
  temp_file.close

  # Upload filtered RSS to a Backblaze B2 bucket
  result = `b2 upload_file --noProgress --contentType application/rss+xml #{B2_BUCKET} #{temp_file.path} #{B2_FILENAME}`
  puts Time.now
  puts result.split("\n").select{|line| line =~ /^URL by file name:/}.join("\n")
ensure
  # Tidy up after ourselves by ensuring we delete the temporary file
  temp_file.close
  temp_file.unlink
end

bbc-news-rss-filter-sport-out.rb

When executed, this Ruby code:

  1. Fetches the original BBC news (UK) RSS feed and parses it as XML using Nokogiri
  2. Filters it to remove all entries whose GUID matches a particular regular expression (removing all of those from the “sport” section of the site)
  3. Outputs the resulting feed into a temporary file
  4. Uploads the temporary file to a bucket in Backblaze‘s “B2” repository (think: a better-value competitor S3); the bucket I’m using is publicly-accessible so anybody’s RSS reader can subscribe to the feed

I like the versatility of the approach I’ve used here and its ability to perform arbitrary mutations on the feed. And I’m a big fan of Nokogiri. In some ways, this could be considered a lower-impact, less real-time version of my tool RSSey. Aside from the fact that it won’t (easily) handle websites that require Javascript, this approach could probably be used in exactly the same ways as RSSey, and with significantly less set-up: I might look into whether its functionality can be made more-generic so I can start using it in more places.

×

Linda Liukas, Hello Ruby and the magic of coding

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Linda Liukas’s best-selling Hello Ruby books teach children that computers are fun and coding can be a magical experience.

See the original article to watch a great video interview with Linda Liukas. Linda is the founder of Rails Girls and author of a number of books encouraging children to learn computer programming (which I’m hoping to show copies of to ours, when they’re a tiny bit older). I’ve mentioned before how important I feel an elementary understanding of programming concepts is to children.

The Ruby Story

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

By 2005, Ruby had become more popular, but it was still not a mainstream programming language. That changed with the release of Ruby on Rails. Ruby on Rails was the “killer app” for Ruby, and it did more than any other project to popularize Ruby. After the release of Ruby on Rails, interest in Ruby shot up across the board, as measured by the TIOBE language index:

It’s sometimes joked that the only programs anybody writes in Ruby are Ruby-on-Rails web applications. That makes it sound as if Ruby on Rails completely took over the Ruby community, which is only partly true. While Ruby has certainly come to be known as that language people write Rails apps in, Rails owes as much to Ruby as Ruby owes to Rails.

As an early adopter of Ruby (and Rails, when it later came along) I’ve always found that it brings me a level of joy I’ve experienced in very few other languages (and never as much). Every time I write Ruby, it takes me back to being six years old and hacking BASIC on my family’s microcomputer. Ruby, more than any other language I’ve come across, achieves the combination of instant satisfaction, minimal surprises, and solid-but-flexible object orientation. There’s so much to love about Ruby from a technical perspective, but for me: my love of it is emotional.

×

Why it is just lazy to bad-mouth Ruby on Rails

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

It’s inevitable these days: we will see an article proclaiming the demise of Ruby on Rails every once in a while. It’s the easiest click bait, like this one from TNW.Now, you may say “another Ruby fanboy.” That’s fair, but a terrible argument, as it’s a poor and common argumentum ad hominem. And on the subject of fallacies, the click-bait article above is wrong exactly because it falls for a blatantly Post hoc ergo propter hoc fallacy plus some more confirmation bias which we are all guilty of falling for all the time.

I’m not saying that the author wrote fallacies on purpose. Unfortunately, it’s just too easy to fall for fallacies. Especially when everybody has an intrinsic desire to confirm one’s biases. Even trying to be careful, I end up doing that as well…

Rails is f*cking boring! I love it.

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Together with a friend I recently built Dropshare Cloud. We offer online storage for the file and screenshot sharing app Dropshare for macOS/iOS. After trying out Django for getting started (we both had some experience using Django) I decided to rewrite the codebase in Rails. My past experience developing in Rails made the process quick — and boring…