rss – Dan Q

BBC Sports News (without the crap)

For the last few years I’ve been running a proxy of the BBC News RSS feeds (https://bbc-feeds.danq.dev) that strips out duplicate content, non-news content, and (optionally) sports news.

This weekend, for the first time, somebody asked if it could produce an edition that included only the sports content. Which turned out to be slightly more difficult, because it’s the kind of scope-creep that my “uninterested-in-sports-news” brain couldn’t conceive that anybody would want! But I got there in the end.

If anybody’s looking for their fill of “BBC News Feeds… But Better!”, give it a look.

Pushing RSS to WhatsApp (for free?)

I’m always keen to experiment with new ways to subscribe to my blog. Obviously RSS is the best and everybody who can use it should. But some people, for their own reasons¹, prefer to learn about updates to their favourite sites via the Fediverse, or Facebook, or Telegram, or… I don’t know… LiveJournal or something (yes, those are all places you can follow this blog, if you really wanted to).

But I’m pretty sure there are some people who’d rather receive updates to my blog via WhatsApp. And now, they can. Here’s how I set up an RSS-to-WhatsApp gateway, in case you want to run one of your own².

Prerequisites

You will need:

A GitHub account – a free one is fine
A Whapi account connected to your WhatsApp account³ – when you set up an account you’ll get a free trial; when it ends you need to find the link to say that you want to carry on with the free tier (or upgrade to the paid tier if you expect to send more messages than the free tier’s limit)
A WhatsApp channel to which you want to push your RSS feed: I’d recommend that you make a newsletter (from the Updates tab in WhatsApp, press the kekab menu then Create Channel) rather than a traditional group: groups are designed for multiple people to talk and discuss and everybody can see one another’s identity, but a newsletter keeps everybody’s identity private and only allows the administrator(s) permission to post updates.

Instructions

Fork the Dan-Q/rss-to-whapi.cloud repository into your own GitHub account.
In Settings > Secrets and Variables > Actions, add two new Repository Secrets:
- WHATSAPP_API_TOKEN: set to the token on your Whapi dashboard
- WHATSAPP_CHANNEL: set to your newsletter ID (will look like 123456789012345678@newsletter) or group ID (will look like 123456789012345678@g.us): you can get this from the Newsletters or Groups section of Whapi by executing a test GET /newsletters or GET /groups request⁴.
Make a feeds.json file (a feeds.json.example is provided as a guide) containing the URLs of the RSS feeds you’d like to subscribe to.
Do a test run: from the Actions tab select the “Process feeds” action and click “Run workflow”. If it finishes successfully (and you get the WhatsApp message), you’re done! If it fails, click on the failed action and drill-in to the failed task to see the error message and correct accordingly.

By default, the processor will run on-demand and every 30 minutes, but you can modify that in .github/workflows/process-feeds.yml. It’s configured to send the single oldest un-sent item in any of the RSS feeds it’s subscribed to, on each run (it tracks which ones it’s sent already by their guids, in a "seen": [...] array in feeds.json): sending a single link per run ensures that WhatsApp’s link previews work as expected. At that rate, you could theoretically run it once every 10 minutes and never hit the 150-messages-per-day limit of Whapi’s free tier⁵), but you’ll want to work out your own optimal rate based on the anticipated update frequency of your feeds and the number of RSS-to-WhatsApp channels you’re running.

You can, of course, run it on your own infrastructure in a similar way. Just check out the repository to your local system with Ruby 3.2+ running, run bundle to install the dependencies, then set up a cron job or some other automation to run ./process_feeds.rb. Doing this could be used to hook it up to your RSS feed updating pipeline, for example, to check for new feed items right after a new post is published.

Footnotes

¹ Their own incomprehensible, illogical, weird reasons.

² I hope that the title gives it away, but you can do this completely for free. So long as you keep your fork of the GitHub repository open-source then you can run GitHub Actions for free, and so long as you’re pushing out no more than 150 updates per day to no more than 5 different channels in a month then you can do it within Whapi’s free tier: that’s probably fine for a personal blogger, and there’s a reasonable pricing structure (plus some value-added extras) for companies that want to use this same workflow as part of a grander WhatsApp offering.

³ Setting this up requires giving Whapi access to your WhatsApp account. If you don’t like the security implications of that, you could get a cheap eSIM, set that up with WhatsApp, and use that account: if you do this, just remember to “warm up” your new WhatsApp account with some conversations with yourself so it doesn’t look so much like a spammer! Also note that the way Whapi works “uses up” one of the ~4 devices on which you can simultaneously use WhatsApp Web/WhatsApp Desktop etc.

⁴ Prefer the command-line? So long as you’ve got curl and jq then you can get a list of your newsletters (or groups) and their IDs with curl -H 'Authorization: Bearer YOUR_API_TOKEN' -H 'accept: application/json' https://gate.whapi.cloud/newsletters?count=100 | jq '.newsletters[] | { id: .id, name: .name }' or curl -H 'Authorization: Bearer YOUR_API_TOKEN' -H 'accept: application/json' https://gate.whapi.cloud/groups?count=100 | jq '.groups[] | { id: .id, name: .name }', respectively.

⁵ Going beyond the free tier would require sending one message, on average, every 9 minutes and 36 seconds.

Ten Pointless Facts About Me

This has been doing the rounds; I last saw it on Kev’s blog. I like that the social blogosphere’s doing this kind of fun activity again, these days¹.

1. Do you floss your teeth?

Umm… sometimes? Not as often as I should. Don’t tell my dentist!

Usually at least once a month, never more than once a week. I really took to heart some advice that if you’re using a fluoridated mouthwash then you shouldn’t do it close to when you brush your teeth (or you counteract the benefits), so my routine is that… when I remember and can be bothered to floss… I’ll floss and mouthwash, but like in the middle of the day.

And since I moved my bedroom (and bathroom) one floor further up our house, it’s harder to find the motivation to do so! So I’m probably flossing less. The unanticipated knock-on effect of extending your house!

2. Tea, coffee, or water?

I love a coffee to start a workday, but I have to be careful how much I consume because caffeine hits me pretty hard, even after a concentrated effort over the last 10 years or so to gradually increase my tolerance. I can manage a couple of mugs in the morning and be fine, now, but three coffees… or any in the mid-afternoon onwards… and I’m at risk of throwing off my ability to sleep later².

I keep a bottle of water wherever I work to try to encourage myself to hydrate, because I’ve got medical evidence to show that I don’t drink enough water! It sometimes works.

3. Footwear preference?

Basic trainers for everyday use; comfortable boots for hiking; slippers for when I’m working. Nothing special.

I wear holes in footwear (and everything else I wear) faster than anybody I know, so nowadays I go for good-value comfort over any other considerations when buying shoes.

A French Bulldog looks-on guiltily at a hand holding the remains of a pair of slippers that have been thoroughly shredded. — One time it was the dog’s fault that my footwear fell apart, but usually they do so by themselves.

4. Favourite dessert?

Varies, but if we’re eating out, I’m probably going to be ordering the most-chocolatey dessert on the menu.

5. The first thing you do when you wake up?

The very first thing I do when I wake up is check how long it is before I need to get up, and make a decision about when I’m going to do so. I almost never need my alarm to wake me: I routinely wake up half an hour or so before my alarm would go off, most mornings. But exactly how early I wake directly impacts what I do next. If I’m well-rested and it’s early enough, I’ll plan on getting up and doing something productive: an early start to work, or some voluntary work for Three Rings, or some correspondence. If it’s close to the time I need to get up I’ll more-often just stay in bed and spend longer doing the actual answer I should give…

…because the “real” answer is probably: pick up my phone, and open up FreshRSS – almost always the first and last thing I do online in a day! I’ll skim the news and blogosphere and “set aside” for later anything I’d like to re-read or look at later on.

6. Age you’d like to stick at?

Honestly, I’m good where I am, thanks.

Sure, I was fitter and healthier in my 20s, and I had more free time in my early 30s… and there are certainly things I miss and get nostalgic about in any era of my life. But conversely: it took me a long, long time to “get my shit together” to the level I have now, and I wouldn’t want to have to go through all of the various bits of self-growth, therapy, etc. all over again!

So… sure, I’d be happy to transplant my intellect into 20-year-old me and take advantage of my higher energy level of the time for an extra decade or so³. But I wouldn’t go back even a decade if it meant that I had to go relearn and go through everything from that decade another time, no thanks!

7. How many hats do you own?

Four. Ish.

They are:

A bandana. Actually, I own maybe half a dozen bandanas, mostly in Pride rainbow colours. Bandanas are amazingly versatile: they fold small which suits my love of travelling light these last few years, they can function as headgear, dust mask, neckerchief, flannel, etc.⁴, and they do a pretty good job of keeping my head cool and protecting my growing bald spot from the fierce rays of the summer sun.
A “geek” hat. Okay, I’ve actually got three of these, too, in slightly different designs. When they first started appearing at Oxford Geek Nights, I just kept winning them! I’m not a huge fan of caps, so mostly the kids wear them… although I do put one on when I’m collecting takeaway food so I can get away with just putting e.g. “geek hat” in the “name” field, rather than my name⁵.
A warm hat that comes out only when the weather is incredibly cold, or when I’m skiing. As I was reminded while skiing on my recent trip to Finland, I should probably switch to wearing a helmet when I ski, but I’ve been skiing for three to four decades without one and I find the habit hard to break.⁶
A wooly hat that I was given by a previous employer at a meetup in Mexico last year. I wore it a couple of times last winter but it’s otherwise not seen much use.

8. Describe the last photo you took?

The last photo I took was of myself wearing a “geek” hat. You’ve seen it, it’s above!

But the one before that was this picture of an extremely large bottle of champagne, with a banana for scale, that was delivered to my house earlier today:

A six-litre bottle of champagne, wrapped in bubble wrap and surrounded by packing peanuts, in a wooden transport case, with a banana resting atop it. — A 6-litre champagne bottle is properly-termed a Methuselah, after Noah’s grandad I guess.

Ruth and JTA celebrate their anniversary every few years with the “next size up” of champagne bottle, and this is the one they’re up to. This year, merely asking me to help them drink it probably won’t be sufficient (that’d still be two litres each!) so we’re probably going to have to get some friends over.

I took the photo to send to Ruth to reassure her that the bottle had arrived safely, after the previous attempt went… less well. I added the banana “for scale” before sharing the photo with some other friends, too.

A wooden case containing a completely smashed 6-litre champagne bottle. — The previous delivery… didn’t go so well. 😱

9. Worst TV show?

PAW Patrol. No doubt.

You know all those 1980s kids TV shows that basically existed for no other purpose than as a marketing vehicle for a range of toys? I’m talking He-Man (and She-Ra), Transformers, G.I. Joe, Care Bears, M.A.S.K., Rainbow Brite, and My Little Pony. Well, those shows look good compared to PAW Patrol.

3D render of a boy and six dogs (each dressed as a representative of a different service) - the PAW Patrol. Ugh. — Six pups, each endowed with exactly one personality trait⁷ but a plethora of accessories and vehicles which expands every season so that no matter how many toys you’ve got, y0u’re always behind the curve.

I was delighted when our kids graduated from PAW Patrol to My Little Pony: Friendship is Magic because it’s an enormously better show (the songs kick ass, too) and we could finally shake off the hollow, pointless, internally-inconsistent advertisement that is PAW Patrol.

10. As a child, what was your aspiration for adulthood?

This is the single most-boring thing about me, and I’ve doubtless talked about it before. At some point between the age of about six and eight years old, I decided that I wanted to grow up and become… a computer programmer.

And then I designed the entirety of the rest of my education around that goal. I learned a variety of languages and paradigms under my own steam while setting myself up for a GCSE in IT, and then A-Levels in Maths and Computing, and then a Degree in Computer Science, and by the time I’d done all of that I was already working in the industry: self-actualised by 21.

Like I said: boring!

Your turn!

You should give this pointless quiz a go too. Ping/Webmention me if you do (or comment below, I suppose); I’d love to read what you write.

Footnotes

¹ They’re internet memes, in the traditional sense, but sadly people usually use “meme” nowadays exclusively to describe image memes, and not other kinds of memetic Internet content. Just another example of our changing Internet language, which I’ve written about before. Sometimes they were silly quizzes (wanna know what Meat Loaf song I am?); sometimes they were about you and your friends. But images, they weren’t: that came later.

² Or else I’ll get a proper jittery heart-flutter going!

³ I wouldn’t necessarily even miss the always-on, in-your-pocket, high-speed Internet of today: the Internet was pretty great back then, too!

⁴ Obviously an intergalactic hitch-hiker should include a bandana, perhaps as well as an equally-versatile towel, in their toolkit.

⁵ It’s not about privacy, although that’s a fringe benefit I suppose: mostly it’s about getting my food quicker! If I walk into Dominos wearing a geek hat and they’ve got pizza on the counter with a label on it that says it’s for “geek hat”, they’ll just hand it over, no questions, and I’m in-and-out in seconds.

⁶ JTA observed that similar excuses were used by people who resisted the rollout of mandatory seatbelt usage in cars, so possibly I’m the “bad guy” here.

⁷ From left to right, the single personality traits for each of the pups are (a) doesn’t like water, (b) is female, (c) likes naps, (d) is allergic to cats, (e) is clumsy, and (f) is completely fucking pointless.

Unread 4.5.2

Last month, my friend Gareth observed that the numbered lists in my blog posts “looked wrong” in his feed reader. I checked, and I decided I was following the standards correctly and it must have been his app that was misbehaving.

So he contacted the authors of Unread, his feed reader, and they fixed it. Pretty fast, I’ve got to say. And I was amused to see that I’m clearly now a test case because my name’s in their release notes!

Feed Readers Beat Doomscrolling

The news has, in general, been pretty terrible lately.

Like many folks, I’ve worked to narrow the focus of the things that I’m willing to care deeply about, because caring about many things is just too difficult when, y’know, nazis are trying to destroy them all.

I’ve got friends who’ve stopped consuming news media entirely. I’ve not felt the need to go so far, and I think the reason is that I already have a moderately-disciplined relationship with news. It’s relatively easy for me to regulate how much I’m exposed to all the crap news in the world and stay focussed and forward-looking.

The secret is that I get virtually all of my news… through my feed reader (some of it pre-filtered, e.g. my de-crappified BBC News feeds).

FreshRSS screenshot showing a variety of feeds categorised as Communities, Distractions, Geeky, YouTube, News, Strangers, etc. Posts from yesterday and today are visible. — I use FreshRSS and I love it. But really: *any* feed reader can improve your relationship with the Web.

Without a feed reader, I can see how I might feel the need to “check the news” several times a day. Pick up my phone to check the time… glance at the news while I’m there… you know how to play that game, right?

But with a feed reader, I can treat my different groups of feeds like… periodicals. The news media I subscribe to get collated in my feed reader and I can read them once, maybe twice per day, just like a daily newspaper. If an article remains unread for several days then, unless I say otherwise, it’s configured to be quietly archived.

My current events are less like a firehose (or sewage pipe), and more like a bottle of (filtered) water.

Categorising my feeds means that I can see what my friends are doing almost-immediately, but I don’t have to be disturbed by anything else unless I want to be. Try getting that from a siloed social network!

Maybe sometimes I see a new breaking news story… perhaps 12 hours after you do. Is that such a big deal? In exchange, I get to apply filters of any kind I like to the news I read, and I get to read it as a “bundle”, missing (or not missing) as much or as little as I like.

On a scale from “healthy media consumption” to “endless doomscrolling”, proper use of a feed reader is way towards the healthy end.

If you stopped using feeds when Google tried to kill them, maybe it’s time to think again. The ecosystem’s alive and well, and having a one-stop place where you can enjoy the parts of the Web that are most-important to you, personally, in an ad-free, tracker-free, algorithmic-filtering-free space that you can make your very own… brings a special kind of peace that I can highly recommend.

Note #25736

After “fixing” BBC News’ RSS feeds I noticed that I was seeing less news (and, somehow, stressing less over everything happening in the USA). Turns out that in switching myself to my new system I’d subscribed to the UK edition, whereas previously I’d been on the Full edition. I’ve corrected it now in my RSS reader, but it was an interesting couple of days.

tl;dr: I accidentally stopped reading international news and I was less stressed

Anyway: if you’re not already using my improved BBC News RSS feeds, they’re at: https://bbc-feeds.danq.dev

BBC News RSS… your way!

It turns out my series of efforts to improve the BBC News RSS feeds are more-popular than I thought. People keep asking for variants of them, and it’s probably time I stopped hosting the resulting feeds on my NAS (which does a good job, but it’s in a highly-kickable place right under my desk).

Screenshot of BBC News RSS Feeds (that don't suck!). — The new site isn’t pretty. But it works.

So I’ve launched BBC-Feeds.DanQ.dev. On a 20-minute schedule, it generates both UK and World editions of the BBC News feeds, filtered to remove iPlayer, Sounds, app “nudges”, duplicates, and other junk, and optionally with the sports news filtered out too.

The entire thing is open source under an ultra-permissive license, so you can run your own copy if you don’t want to use mine.

Enjoy!

BBC News RSS… with the sport?

There’s now a much, much better version of this. Go use that instead: bbc-feeds.danq.dev.

Earlier today, somebody called Allan commented on the latest in my series of several blog posts about how I ~~mutilate~~ manipulate the RSS feeds of BBC News to work around their (many, and increasingly so) various shortcomings, specifically:

Their inclusion of non-news content such as plugs for iPlayer and their apps,
Their repeating of identical news stories with marginally-different GUIDs, and
All of the sports news, which I don’t care about one jot.

Well, it turns out that some people want #3: the sport. But still don’t want the other two.

I shan’t be subscribing to this RSS feed, and I can’t promise I’ll fix it if it gets broken. But if “without the crap, but with the sports” is the way you like your BBC News RSS feed, I’ve got you covered:

So there you go, Allan, and anybody in a similar position. I hope that fulfils your need for sports news… without the crap.

Guinness in the Bath

It’s been a long day of driving around Ireland, scrambling through forests, navigating to a hashpoint, exploring a medieval castle, dodging the rain, finding a series of geocaches, getting lost up a hill in the dark, and generally having a kickass time with one of my very favourite people on this earth: my mum.

And now it’s time for a long soak in a hot bath with a pint of the black stuff and my RSS reader for company. A perfect finish.

XPath Scraping AdamKoszary.co.uk

Adam Koszary – whom I worked alongside at the Bodleian – the social media specialist who brought the “absolute unit” meme to the masses, started blogging earlier (again?) this year. Yay!

But he’s completely neglected to put an RSS feed on hew new blog. Boo!

Dan, wearing a VR headset, sits in an office environment, watched by Adam. — People who saw Adam and I work together might have questioned the degree to which it counted as “work”, but that’s another story.

I’ve talked at length about how I use FreshRSS‘s “XPath Scraping” feature (for Bev’s blog, Far Side, Forward, new Far Side, and Vmail, among others), but earlier this week somebody left a comment to ask me more about how I test and debug my XPath scrapers. Given that I now need to add one for Adam’s blog, I’m in a wonderful position to walk you through it!

Setting up and debugging your FreshRSS XPath Scraper

Okay, so here’s Adam’s blog. I’ve checked, and there’s no RSS feed¹, so it’s time to start planning my XPath Scraper. The first thing I want to do is to find some way of identifying the “posts” on the page. Sometimes people use solid, logical id="..." and class="..." attributes, but I’m going to need to use my browser’s “Inspect Element” tool to check:

The next thing that’s worth checking is that the content you’re inspecting is delivered with the page, and not loaded later using JavaScript. FreshRSS’s XPath Scraper works with the raw HTML/XML that’s delivered to it; it doesn’t execute any JavaScript², so I use “View Source” and quickly search to see that the content I’m looking for is there, too.

Now it’s time to try and write some XPath queries. Luckily, your browser is here to help! If you pop up your debug console, you’ll discover that you’re probably got a predefined function, $x(...), to which you can path a string containing an XPath query and get back a NodeList of the element.

First, I’ll try getting all of the links inside the #posts section by running $x( '//*[@id="posts"]//a' ) –

In my first attempt, I discovered that I got not only all the posts… but also the “tags” at the top. That’s no good. Inspecting the URLs of each, I noticed that the post URLs all contained /posts/, so I filtered my query down to $x( '//*[@id="posts"]//a[contains(@href, "/posts/")]' ) which gave me the expected number of results. That gives me //*[@id="posts"]//a[contains(@href, "/posts/")] as the XPath query for “news items”:

Obviously, this link points to the full post, so that tells me I can put ./@href as the “item link” attribute in FreshRSS.

Next, it’s time to see what other metadata I can extract from each post to help FreshRSS along:

Inspecting the post titles shows that they’re <h3>s. Running $x( '//*[@id="posts"]//a[contains(@href, "/posts/")]//h3' ) gets them. Within FreshRSS, everything “within” a post is referenced relative to the post, so I convert this to descendant::h3 for my “XPath (relative to item) for Item Title:” attribute.

Inspecting within the post summary content, it’s… not great for scraping. The elements class names don’t correspond to what the content is⁴: it looks like Adam’s using a utility class library⁵.

Everything within the <a> that we’ve found is wrapped in a <div class="flex-grow">. But within that, I can see that the date is directly inside a <p>, whereas the summary content is inside a <p> within a <div class="mb-2">. I don’t want my code to be too fragile, and I think it’s more-likely that Adam will change the class names than the structure, so I’ll tie my queries to the structure. That gives me descendant::div/p for the date and descendant::div/div/p for the “content”. All that remains is to tell FreshRSS that Adam’s using F j, Y as his date format (long month name, space, short day number, comma, space, long year number) so it knows how to parse those dates, and the feed’s good.

If it’s wrong and I need to change anything in FreshRSS, the “Reload Articles” button can be used to force it to re-load the most-recent X posts. Useful if you need to tweak things. In my case, I’ve also set the “Article CSS selector on original website” field to article so that the full post text can be pulled into my reader rather than having to visit the actual site. Then I’m done!

Takeaways

Use Inspect Element to find the elements you want to scrape for.
Use $x( ... ) to test your XPath expressions.
Remember that most of FreshRSS’s fields ask for expressions relative to the news item and adapt accordingly.
If you make a mistake, use “Reload Articles” to pull them again.

Footnotes

¹ Boo again!

² If you need a scraper than executes JavaScript, you need something more-sophisticated. I used to use my very own RSSey for this purpose but nowadays XPath Scraping is sufficient so I don’t bother any more, but RSSey might be a good starting point for you if you really need that kind of power!

³ If you’ve not had the chance to think about it before: View Source shows you the actual HTML code that was delivered from the web server to your browser. This then gets interpreted by the browser to generate the DOM, which might result in changes to it: for example, invalid elements might be removed, ambiguous markup will have an interpretation applied, and so on. The DOM might further change as a result of JavaScript code, browser plugins, and whatever else. When you Inspect Element, you’re looking at the DOM (represented “as if” it were HTML), not the actual underlying HTML

⁴ The date isn’t in a <time> element nor does it have a class like .post--date or similar.

⁵ I’ll spare you my thoughts on utility class libraries for now, but they’re… not positive. I can see why people use them, and I’ve even used them myself before… but I don’t think they’re a good thing.

Fixing BBC News… Yet Again

The Beeb continue to keep adding more and more non-news content to the BBC News RSS feed (like this ad for the iPlayer app!), so I’ve once again had to update my script to “fix” the feed so that it only contains, y’know, news.

Vmail via FreshRSS

It’s time for… Dan Shares Yet Another FreshRSS XPath Scraping Recipe!

Vmail

I’m a huge fan of the XPath scraping feature of FreshRSS, my favourite feed reader (and one of the most important applications in my digital ecosystem). I’ve previously demonstrated how to use the feature to subscribe to Forward, reruns of The Far Side, and new The Far Side content, despite none of those sites having “official” feeds.

Vmail is cool. It’s vole.wtf’s (of ARCC etc. fame) community newsletter, and it’s as batshit crazy as you’d expect if you were to get the kinds of people who enjoy that site and asked them all to chip in on a newsletter.

Totes bonkers.

But email’s not how I like to consume this kind of media. So obviously, I scraped it.

Recipe

Want to subscribe to Vmail using your own copy of FreshRSS? Here’s the settings you’re looking for –

Type of feed source: HTML + XPath (Web scraping)
XPath for finding news items: //table/tbody/tr
It’s just a table with each row being a newsletter; simple!
XPath for item title: descendant::a
XPath for item content: .
XPath for item link (URL): descendant::a/@href
XPath for item date: descendant::td[1]
Custom date/time format: d M *y
The dates are in a format that’s like 01 May ’24 – two-digit days with leading zeros, three-letter months, and a two-digit year preceded by a curly quote, separated by spaces. That curl quote screws up PHP’s date parser, so we have to give it a hint.
XPath for unique item ID: descendant::th
Optional, but each issue’s got its own unique ID already anyway; we might as well use it!
Article CSS selector on original website: #vmail
Optional, but recommended: this option lets you read the entire content of each newsletter without leaving FreshRSS.

So yeah, FreshRSS continues to be amazing. And lately it’s helped me keep on top of the amazing/crazy of vole.wtf too.

Somewhat-Effective Spam Filters

I’ve tried a variety of unusual strategies to combat email spam over the years.

Here are some of them (each rated in terms the geekiness of its implementation and its efficacy), in case you’d like to try any yourself. They’re all still in use in some form or another:

Spam filters

Geekiness: 1/10
Efficacy: 5/10

Your email provider or your email software probably provides some spam filters, and they’re probably pretty good. I use Proton‘s and, when I’m at my desk, Thunderbird‘s. Double-bagging your spam filter only slightly reduces the amount of spam that gets through, but increases your false-positive rate and some non-spam gets mis-filed.

A particular problem is people who email me for help after changing their name on FreeDeedPoll.org.uk, probably because they’re not only “new” unsolicited contacts to me but because by definition many of them have strange and unusual names (which is why they’re emailing me for help in the first place).

Frankly, spam filters are probably enough for many people. Spam filtering is in general much better today than it was a decade or two ago. But skim the other suggestions in case they’re of interest to you.

Unique email addresses

Geekiness: 3/10
Efficacy: 8/10

If you give a different email address to every service you deal with, then if one of them misuses it (starts spamming you, sells your data, gets hacked, whatever), you can just block that one address. All the addresses come to the same inbox, for your convenience. Using a catch-all means that you can come up with addresses on-the-fly: you can even fill a paper form with a unique email address associated with the company whose form it is.

On many email providers, including the ever-popular GMail, you can do this using plus-sign notation. But if you want to take your unique addresses to the next level and you have your own domain name (which you should), then you can simply redirect all email addresses on that domain to the same inbox. If Bob’s Building Supplies wants your email address, give them bobs@yourname.com, which works even if Bob’s website erroneously doesn’t accept email addresses with plus signs in them.

This method actually works for catching people misusing your details. On one occasion, I helped a band identify that their mailing list had been hacked. On another, I caught a dodgy entrepreneur who used the email address I gave to one of his businesses without my consent to send marketing information of a different one of his businesses. As a bonus, you can set up your filtering/tagging/whatever based on the incoming address, rather than the sender, for the most accurate finding, prioritisation, and blocking.

Also, it makes it easy to have multiple accounts with any of those services that try to use the uniqueness of email addresses to prevent you from doing so. That’s great if, like me, you want to be in each of three different Facebook groups but don’t want to give Facebook any information (not even that you exist at the intersection of those groups).

Signed unique email addresses

Geekiness: 10/10
Efficacy: 2/10

Unique email addresses introduce two new issues: (1) if an attacker discovers that your Dreamwidth account has the email address dreamwidth@yourname.com, they can probably guess your LinkedIn email, and (2) attackers will shotgun “likely” addresses at your domain anyway, e.g. admin@yourname.com, management@yourname.com, etc., which can mean that when something gets through you get a dozen copies of it before your spam filter sits up and takes notice.

What if you could assign unique email addresses to companies but append a signature to each that verified that it was legitimate? I came up with a way to do this and implemented it as a spam filter, and made a mobile-friendly webapp to help generate the necessary signatures. Here’s what it looked like:

The domain directs all emails at that domain to the same inbox.
If the email address is on a pre-established list of valid addresses, that’s fine.
Otherwise, the email address must match the form of:
- A string (the company name), followed by
- A hyphen, followed by
- A hash generated using the mechanism described below, then
- The @-sign and domain name as usual

The hashing algorithm is as follows: concatenate a secret password that only you know with a colon then the “company name” string, run it through SHA1, and truncate to the first eight characters. So if my password were swordfish1 and I were generating a password for Facebook, I’d go:

SHA1 ( swordfish1 : facebook) [ 0 ... 8 ] = 977046ce
Therefore, the email address is facebook-977046ce@myname.com
If any character of that email address is modified, it becomes invalid, preventing an attacker from deriving your other email addresses from a single point (and making it hard to derive them given multiple points)

I implemented the code, but it soon became apparent that this was overkill and I was targeting the wrong behaviours. It was a fun exercise, but ultimately pointless. This is the one method on this page that I don’t still use.

Honeypots

Geekiness: 8/10
Efficacy: ?/10

A honeypot is a “trap” email address. Anybody who emails it get aggressively marked as a spammer to help ensure that any other messages they send – even to valid email addresses – also get marked as spam.

I litter honeypots all over the place (you might find hidden email addresses on my web pages, along with text telling humans not to use them), but my biggest source of honeypots is formerly-valid unique addresses, or “guessed” catch-all addresses, which already attract spam or are otherwise compromised!

I couldn’t tell you how effective it is without looking at my spam filter’s logs, and since the most-effective of my filters is now outsourced to Proton, I don’t have easy access to that. But it certainly feels very satisfying on the occasions that I get to add a new address to the honeypot list.

Instant throwaways

Geekiness: 5/10
Efficacy: 6/10

OpenTrashmail is an excellent throwaway email server that you can deploy in seconds with Docker, point some MX records at, and be all set! A throwaway email server gives you an infinite number of unique email addresses, like other solutions described above, but with the benefit that you never have to see what gets sent to them.

If you offer me a coupon in exchange for my email address, it’s a throwaway email address I’ll give you. I’ll make one up on the spot with one of my (several) trashmail domains at the end of it, like justgivemethedamncoupon@danstrashmailserver.com. I can just type that email address into OpenTrashmail to see what you sent me, but then I’ll never check it again so you can spam it to your heart’s content.

As a bonus, OpenTrashmail provides RSS feeds of inboxes, so I can subscribe to any email-based service using my feed reader, and then unsubscribe just as easily (without even having to tell the owner).

Summary

With the exception of whatever filters your provider or software comes with, most of these options aren’t suitable for regular folks. But you’re only a domain name (assuming you don’t have one already) away from being able to give unique email addresses to everybody you deal with, and that’s genuinely a game-changer all by itself and well worth considering, in my opinion.

link rel=”blogroll”

Dave Winer kindly let me know about a proposed standard for linking to OPML blogrolls. Given that I added a page containing my blogroll last year, it was easy enough for me to add a tiny bit of code to the header to add support for automatic detection of my blogroll.

<link rel="blogroll" type="text/xml" href="/blogroll.xml" title="Dan Q's blogroll">

Now all we need is some tools that can do such detection!

(You’ll note I’ve added a title attribute: as I discovered the other day, some browsers including ELinks will show all <link>s of unknown rel="..." at the top of the page and I wanted this one to make sense!)

5 Cool Apps for your Unraid NAS

I’ve got a (now four-year-old) Unraid NAS called Fox and I’m a huge fan. I particularly love the fact that Unraid can work not only as a NAS, but also as a fully-fledged Docker appliance, enabling me to easily install and maintain all manner of applications.

A cube-shaped black computer sits next to a battery pack on a laminated floor. A sign has been left atop it, reading "Caution: Generator connected to this installation." — There isn’t really a generator attached to Fox, just a UPS battery backup. The sign was liberated from our shonky home electrical system.

I was chatting this week to a colleague who was considering getting a similar setup, and he seemed to be taking notes of things he might like to install, once he’s got one. So I figured I’d round up five of my favourite things to install on an Unraid NAS that:

Don’t require any third-party accounts (low dependencies),
Don’t need any kind of high-powered hardware (low specs), and
Provide value with very little set up (low learning curve).

Here we go:

Syncthing

I’ve been raving about Syncthing for years. If I had an “everyday carry” list of applications, it’d be high on that list.

Here’s the skinny: you install Syncthing on several devices, then give each the identification key of another to pair them. Now you can add folders on each and “share” them with the others, and the two are kept in-sync. There’s lots of options for power users, but just as a starting point you can use this to:

Manage the photos on your phone and push copies to your desktop whenever you’re home (like your favourite cloud photo sync service, but selfhosted).
Keep your Obsidian notes in-sync between all your devices (normally costs $4/month).¹
Get a copy of the documents from all your devices onto your NAS, for backup purposes (note that sync’ing alone, even with versioning enabled, is not a good backup: the idea is that you run an actual backup from your NAS!).

Huginn

You know IFTTT? Zapier? Services that help you to “automate” things based on inputs and outputs. Huginn’s like that, but selfhosted. Also: more-powerful.

The learning curve is steeper than anything else on this list, and I almost didn’t include it for that reason alone. But once you’ve learned your way around its idiosyncrasies and dipped your toe into the more-advanced Javascript-powered magic it can do, you really begin to unlock its potential.

It couples well with Home Assistant, if that’s your jam. But even without it, you can find yourself automating things you never expected to.

FreshRSS

I’ve written a lot about how and why FreshRSS continues to be my favourite RSS reader. But you know what’s even better than an awesome RSS reader? An awesome selfhosted RSS reader!

Many of these suggested apps benefit well from you exposing them to the open Web rather than just running them on your LAN, and an RSS reader is probably the best example (you want to read your news feeds when you’re out and about, right?). What you need for that is a reverse proxy, and there are lots of guides to doing it super-easily, even if you’re not on a static IP address.². Alternatively you can just VPN in to your home: your router might be able to arrange this, or else Unraid can do it for you!

Open Trashmail

You know how sometimes you need to give somebody your email address but you don’t actually want to. Like: sure, I’d like you to email me a verification code for this download, but I don’t trust you not to spam me later! What you need is a disposable email address.³

You just need to install Open Trashmail, point the MX records of a few domain names or subdomains (you’ve got some spare domain names lying around, right? if not; they’re pretty cheap…) at it, and it will now accept email to any address on those domains. You can make up addresses off the top of your head, even away from an Internet connection when using a paper-based form, and they work. You can check them later if you want to… or ignore them forever.

Couple it with an RSS reader, or Huginn, or Slack, and you can get a notification or take some action when an email arrives!

Need to give that escape room your email address to get a copy of your “team photo”? Give them a throwaway, pick up the picture when you get home, and then forget you ever gave it to them.
Company give you a freebie on your birthday if you sign up their mailing list? Sign up 366 times with them and write a Huginn workflow that puts “today’s” promo code into your Obsidian notetaking app (Sync’d over Syncthing) but filters out everything else.
Suspect some organisation is selling your email address on to third parties? Give them a unique email address that you only give to them and catch them in a honeypot.

YOURLS

Finally: a URL shortener. The Internet’s got lots of them, but they’re all at the mercy of somebody else (potentially somebody in a country that might not be very-friendly with yours…).

Plus, it’s just kinda cool to be able to brand your shortlinks with your own name, right? If you follow only one link from this post, let it be to watch this video that helps explain why this is important: danq.link/url-shortener-highlights.

I run many, many other Docker containers and virtual machines on my NAS. These five aren’t even the “top five” that I use… they’re just five that are great starters because they’re easy and pack a lot of joy into their learning curve.

And if your NAS can’t do all the above… consider Unraid for your next NAS!

Footnotes

¹ I wrote the beginnings of this post on my phone while in the Channel Tunnel and then carried on using my desktop computer once I was home. Sync is magic.

² I can’t share or recommend one reverse proxy guide in particular because I set my own up because I can configure Nginx in my sleep, but I did a quick search and found several that all look good so I imagine you can do the same. You don’t have to do it on day one, though!

³ Obviously there are lots of approachable to on-demand disposable email addresses, including the venerable “plus sign in a GMail address” trick, but Open Trashmail is just… better for many cases.