BBC News… without the crap

Did I mention recently that I love RSS? That it brings me great joy? That I start and finish almost every day in my feed reader? Probably.

I used to have a single minor niggle with the BBC News RSS feed: that it included sports news, which I didn’t care about. So I wrote a script that downloaded it, stripped sports news, and re-exported the feed for me to subscribe to. Magic.

RSS reader showing duplicate copies of the news story "Barbie 2? 'We'd love to,' says Warner Bros boss", and an entry from BBC Sounds.
Lately my BBC News feed has caused me some annoyance and frustration.

But lately – presumably as a result of technical changes at the Beeb’s side – this feed has found two fresh ways to annoy me:

  1. The feed now re-publishes a story if it gets re-promoted to the front pagebut with a different <guid> (it appears to get a #0 after it when first published, a #1 the second time, and so on). In a typical day the feed reader might scoop up new stories about once an hour, any by the time I get to reading them the same exact story might appear in my reader multiple times. Ugh.
  2. They’ve started adding iPlayer and BBC Sounds content to the BBC News feed. I don’t follow BBC News in my feed reader because I want to watch or listen to things. If you do, that’s fine, but I don’t, and I’d rather filter this content out.

Luckily, I already have a recipe for improving this feed, thanks to my prior work. Let’s look at my newly-revised script (also available on GitHub):

#!/usr/bin/env ruby
require 'bundler/inline'

# # Sample crontab:
# # At 41 minutes past each hour, run the script and log the results
# */20 * * * * ~/bbc-news-rss-filter-sport-out.rb > ~/bbc-news-rss-filter-sport-out.log 2>>&1

# Dependencies:
# * open-uri - load remote URL content easily
# * nokogiri - parse/filter XML
gemfile do
  source 'https://rubygems.org'
  gem 'nokogiri'
end
require 'open-uri'

# Regular expression describing the GUIDs to reject from the resulting RSS feed
# We want to drop everything from the "sport" section of the website, also any iPlayer/Sounds links
REJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/(sport|iplayer|sounds)\//

# Load and filter the original RSS
rss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
rss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING }.each(&:unlink)

# Strip the anchors off the <guid>s: BBC News "republishes" stories by using guids with #0, #1, #2 etc, which results in duplicates in feed readers
rss.css('guid').each{|g|g.content=g.content.gsub(/#.*$/,'')}

File.open( '/www/bbc-news-no-sport.xml', 'w' ){ |f| f.puts(rss.to_s) }
It’s amazing what you can do with Nokogiri and a half dozen lines of Ruby.

That revised script removes from the feed anything whose <guid> suggests it’s sports news or from BBC Sounds or iPlayer, and also strips any “anchor” part of the <guid> before re-exporting the feed. Much better. (Strictly speaking, this can result in a technically-invalid feed by introducing duplicates, but your feed reader oughta be smart enough to compensate for and ignore that: mine certainly is!)

You’re free to take and adapt the script to your own needs, or – if you don’t mind being tied to my opinions about what should be in BBC News’ RSS feed – just subscribe to my copy at: https://fox.q-t-a.uk/bbc-news-no-sport.xml

×

Reactions

No time to comment? Send an emoji with just one click!

6 comments

  1. Turk Turk says:

    Thank you very much for this.

  2. Jules Jules says:

    I am so glad that there are other people out there frustrated by this! In the last few weeks it seems to have got 50% worse. I generally read the feed once per day – and it has gone from ~90 posts per 24hrs to 170 – all crap.

    I sent a message to the BBC – not that it would do any good…. I had not thought of creating an edited feed, not sure I can be bothered so I will use your feed – hopefully you keep it up.

  3. CDS CDS says:

    Good work! Glad to know there are more of us affected by this. Maybe enough of us can feed back to the BBC about the problems their changes have caused.

    Thanks!

  4. Josh Dawson Josh Dawson says:

    You absolute legend! This has saved a lot of annoyance

  5. James White James White says:

    Interesting to come across this! At my employer we use the BBC News RSS feeds for digital signage purposes, so technically consumed by a web app rather than the traditional RSS feed reader, but similar parsing requirements.

    There’s a few quirks to the feeds some of which you’ve noted but a few more if you use the data as raw XML.

    – The items of news aren’t in pubDate order in the feed, I assume RSS readers handle this.
    – BBC inserts a “Download now” item for their app, which I guess is fine, but ideally wants to be excluded.
    – The GUID issue as described.

    An alternative to the GUID problem, is do an array map by the item link URL value and use that as a unique key which would remove duplicates of the same URL. It seems a bit random on when stories tend to repeat. i.e. if an item has a GUID with #9, it doesn’t mean there’s a repeat of the content 9 times, hence why deciding to use the URL. Again, as I’m not using an RSS reader as the client, there’s different parsing requirements.

    Interesting all the same, it’s always nice when you come across someone who’s found the same things!

  6. Dan Q Dan Q says:

    Had to update the code again today, after the Beeb decided to start injecting ads for iPlayer into their “news” feed. Fixed now!

Reply here

Your email address will not be published. Required fields are marked *

Reply on your own site

Reply elsewhere

You can reply to this post on Mastodon (@blog@danq.me), Mastodon (@dan@danq.me).

Reply by email

I'd love to hear what you think. Send an email to b23102@danq.me; be sure to let me know if you're happy for your comment to appear on the Web!