BBC News… without the sport

I love RSS, but it’s a minor niggle for me that if I subscribe to any of the BBC News RSS feeds I invariably get all the sports news, too. Which’d be fine if I gave even the slightest care about the world of sports, but I don’t.

Sports on the BBC News site
Down with Things Like This!

It only takes a couple of seconds to skim past the sports stories that clog up my feed reader, but because I like to scratch my own itches, I came up with a solution. It’s more-heavyweight perhaps than it needs to be, but it does the job. If you’re just looking for a BBC News (UK) feed but with sports filtered out you’re welcome to share mine: https://f001.backblazeb2.com/file/Dan–Q–Public/bbc-news-nosport.rss https://fox.q-t-a.uk/bbc-news-no-sport.xml.

If you’d like to see how I did it so you can host it yourself or adapt it for some similar purpose, the code’s below or on GitHub:

#!/usr/bin/env ruby

# # Sample crontab:
# # At 41 minutes past each hour, run the script and log the results
# 41 * * * * ~/bbc-news-rss-filter-sport-out.rb > ~/bbc-news-rss-filter-sport-out.log 2>&1

# Dependencies:
# * open-uri - load remote URL content easily
# * nokogiri - parse/filter XML
# * b2       - command line tools, described below
require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'nokogiri'
end
require 'open-uri'

# Regular expression describing the GUIDs to reject from the resulting RSS feed
# We want to drop everything from the "sport" section of the website
REJECT_GUIDS_MATCHING = /^https:\/\/www\.bbc\.co\.uk\/sport\//

# Assumption: you're set up with a Backblaze B2 account with a bucket to which
# you'd like to upload the resulting RSS file, and you've configured the 'b2'
# command-line tool (https://www.backblaze.com/b2/docs/b2_authorize_account.html)
B2_BUCKET = 'YOUR-BUCKET-NAME-GOES-HERE'
B2_FILENAME = 'bbc-news-nosport.rss'

# Load and filter the original RSS
rss = Nokogiri::XML(open('https://feeds.bbci.co.uk/news/rss.xml?edition=uk'))
rss.css('item').select{|item| item.css('guid').text =~ REJECT_GUIDS_MATCHING }.each(&:unlink)

begin
  # Output resulting filtered RSS into a temporary file
  temp_file = Tempfile.new
  temp_file.write(rss.to_s)
  temp_file.close

  # Upload filtered RSS to a Backblaze B2 bucket
  result = `b2 upload_file --noProgress --contentType application/rss+xml #{B2_BUCKET} #{temp_file.path} #{B2_FILENAME}`
  puts Time.now
  puts result.split("\n").select{|line| line =~ /^URL by file name:/}.join("\n")
ensure
  # Tidy up after ourselves by ensuring we delete the temporary file
  temp_file.close
  temp_file.unlink
end

bbc-news-rss-filter-sport-out.rb

When executed, this Ruby code:

  1. Fetches the original BBC news (UK) RSS feed and parses it as XML using Nokogiri
  2. Filters it to remove all entries whose GUID matches a particular regular expression (removing all of those from the “sport” section of the site)
  3. Outputs the resulting feed into a temporary file
  4. Uploads the temporary file to a bucket in Backblaze‘s “B2” repository (think: a better-value competitor S3); the bucket I’m using is publicly-accessible so anybody’s RSS reader can subscribe to the feed

I like the versatility of the approach I’ve used here and its ability to perform arbitrary mutations on the feed. And I’m a big fan of Nokogiri. In some ways, this could be considered a lower-impact, less real-time version of my tool RSSey. Aside from the fact that it won’t (easily) handle websites that require Javascript, this approach could probably be used in exactly the same ways as RSSey, and with significantly less set-up: I might look into whether its functionality can be made more-generic so I can start using it in more places.

×

Protecting Yourself from Identity Theft

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

The reality is that your sensitive data has likely already been stolen, multiple times. Cybercriminals have your credit card information. They have your social security number and your mother’s maiden name. They have your address and phone number. They obtained the data by hacking any one of the hundreds of companies you entrust with the data­ — and you have no visibility into those companies’ security practices, and no recourse when they lose your data.

Given this, your best option is to turn your efforts toward trying to make sure that your data isn’t used against you. Enable two-factor authentication for all important accounts whenever possible. Don’t reuse passwords for anything important — ­and get a password manager to remember them all.

Bruce speaks my mind. Emphasis mine.

My TED Video on the Future of Work

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

I was thrilled to participate in TED’s new video series, The Way We Work, and not surprisingly I made the case that distributed work is where everything is headed.

Like Automattic (Matt’s company), Three Rings has also long been ahead of the curve from a “recruit talent from wherever it is, let people work from wherever they are” perspective. Until I was recently reading (more than I had previously) about the way that Automattic “works” I was uncertain about the scalability of Three Rings’ model. Does it work for a commercial company (rather than a volunteer-run non-profit like Three Rings)? Does it work when you make the jump from dozens of staff to hundreds? It’s reassuring to see that yes, this kind of approach certainly can work, and to get some context on how it does (in Automattic’s case, at least). Nice video, Matt!