BBC News… without the sport

I love RSS, but it’s a minor niggle for me that if I subscribe to any of the BBC News RSS feeds I invariably get all the sports news, too. Which’d be fine if I gave even the slightest care about the world of sports, but I don’t.

Sports on the BBC News site
Down with Things Like This!

It only takes a couple of seconds to skim past the sports stories that clog up my feed reader, but because I like to scratch my own itches, I came up with a solution. It’s more-heavyweight perhaps than it needs to be, but it does the job. If you’re just looking for a BBC News (UK) feed but with sports filtered out you’re welcome to share mine: https://f001.backblazeb2.com/file/Dan–Q–Public/bbc-news-nosport.rss.

If you’d like to see how I did it so you can host it yourself or adapt it for some similar purpose, the code’s below or on GitHub:

#!/usr/bin/env ruby

# # Sample crontab:
# # At 41 minutes past each hour, run the script and log the results
# 41 * * * * ~/bbc-news-rss-filter-sport-out.rb > ~/bbc-news-rss-filter-sport-out.log 2>&1

# Dependencies:
# * open-uri - load remote URL content easily
# * nokogiri - parse/filter XML
# * b2       - command line tools, described below
require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'nokogiri'
end
require 'open-uri'

# Regular expression describing the GUIDs to reject from the resulting RSS feed
# We want to drop everything from the "sport" section of the website
REJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/sport\//

# Assumption: you're set up with a Backblaze B2 account with a bucket to which
# you'd like to upload the resulting RSS file, and you've configured the 'b2'
# command-line tool (https://www.backblaze.com/b2/docs/b2_authorize_account.html)
B2_BUCKET = 'YOUR-BUCKET-NAME-GOES-HERE'
B2_FILENAME = 'bbc-news-nosport.rss'

# Load and filter the original RSS
rss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
rss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING }.each(&:unlink)

begin
  # Output resulting filtered RSS into a temporary file
  temp_file = Tempfile.new
  temp_file.write(rss.to_s)
  temp_file.close

  # Upload filtered RSS to a Backblaze B2 bucket
  result = `b2 upload_file --noProgress --contentType application/rss+xml #{B2_BUCKET} #{temp_file.path} #{B2_FILENAME}`
  puts Time.now
  puts result.split("\n").select{|line| line =~ /^URL by file name:/}.join("\n")
ensure
  # Tidy up after ourselves by ensuring we delete the temporary file
  temp_file.close
  temp_file.unlink
end

bbc-news-rss-filter-sport-out.rb

When executed, this Ruby code:

  1. Fetches the original BBC news (UK) RSS feed and parses it as XML using Nokogiri
  2. Filters it to remove all entries whose GUID matches a particular regular expression (removing all of those from the “sport” section of the site)
  3. Outputs the resulting feed into a temporary file
  4. Uploads the temporary file to a bucket in Backblaze‘s “B2” repository (think: a better-value competitor S3); the bucket I’m using is publicly-accessible so anybody’s RSS reader can subscribe to the feed

I like the versatility of the approach I’ve used here and its ability to perform arbitrary mutations on the feed. And I’m a big fan of Nokogiri. In some ways, this could be considered a lower-impact, less real-time version of my tool RSSey. Aside from the fact that it won’t (easily) handle websites that require Javascript, this approach could probably be used in exactly the same ways as RSSey, and with significantly less set-up: I might look into whether its functionality can be made more-generic so I can start using it in more places.

2 replies to BBC News… without the sport

Leave a Reply

Your email address will not be published. Required fields are marked *