I got some great feedback to yesterday’s post about using FreshRSS + XPath to subscribe to Forward, including helpful comments from FreshRSS developer Alexandre Alapetite and from somebody who appreciated it and my Far Side “Daily Dose” recipe and wondered if it was possible to get the new Far Side content in FreshRSS too.
Wait, there’s new Far Side content? Yup: it turns out Gary Larson’s dusted off his pen and started drawing again. That’s awesome! But the last thing I want is to have to go to the website once every few… what: days? weeks? months? He’s not syndicated any more so he’s not got a deadline to work to! If only there were some way to have my feed reader, y’know, do it for me and let me know whenever he draws something new.
Here’s my setup for getting Larson’s new funnies right where I want them:
-
Feed URL:
https://www.thefarside.com/new-stuff/1
This isn’t a valid address for any of the new stuff, but always seems to redirect to somewhere that is, so that’s nice. -
XPath for finding news items:
//div[@class="swiper-slide"]
Turns out all the “recent” new stuff gets loaded in the HTML and then JavaScript turns it into a slider etc.; some of the CSS classes change when the JavaScript runs so I needed to View Source rather than use my browser’s inspector to find everything. -
Item title:
concat("Far Side #", descendant::button[@aria-label="Share"]/@data-shareable-item)
Ugh. The easiest place I could find a “clean” comic ID number was in adata-
attribute of the “share” button, where it’s presumably used for engagement tracking. Still, whatever works right? -
Item content:
descendant::figcaption
When Larson captions a comic, the caption is important. -
Item link (URL) and item unique ID:
concat("https://www.thefarside.com", ./@data-path)
The URLs work as direct links to the content, and because they’re unique, they make a reasonable unique ID too (so long as their numbering scheme is internally-consistent, this should stop a re-run of new content popping up in your feed reader if the same comic comes around again). -
Item thumbnail:
concat("https://fox.q-t-a.uk/referer-faker.php?pw=YOUR-SECRET-PASSWORD-GOES-HERE&referer=https://www.thefarside.com/&url=", descendant::img[@data-src]/@data-src)
The Far Side usesReferer:
headers as an anti-hotlinking measure, which prevents us easily loading the images directly in an RSS reader. I use this tiny PHP script as a proxy to mitigate that. If you don’t have such a proxy set up, you could simply omit the “Item thumbnail” and “Item content” fields and click the link to go to the original page. -
Item date:
normalize-space(descendant::div[@class="tfs-comic-new__meta"]/*[1])
The date is spread through two separate text nodes, so we get the content of their wrapper and usenormalize-space
to tidy the whitespace up. The date format then looks like “Wednesday, March 29, 2023”, which we can parse using a custom date/time format string: -
Custom date/time format:
l, F j, Y
I promise I’ll stop writing about how awesome FreshRSS + XPath is someday. Today isn’t that day.
Meanwhile: if you used to use a feed reader but gave up when the Web started to become hostile to them and big social media systems started to wall you in, you should really consider picking one up again. The stuff I write about is complex edge-cases that most folks don’t need to think about in order to benefit from RSS… but it’s super convenient to have the things you care about online (news, blogs, social media, videos, newsletters, comics, search trends…) collated and sorted for you… without interference from algorithms that want to push “sticky” content, without invasive tracking or advertisements (or cookie banners or privacy popups), without something “disappearing” simply because you put off reading it for a few days.
Heck yeah! Just set it up and it’s working great! Thanks!
Thanks for posting this. I’m a “regular” person who doesn’t code, but works with people who do. I love how RSS lets me consume on my own terms. It’s starting to look like self-hosting is the only way anyone will survive the social media implosion.
Oh how I wish that I could self-host FreshRSS!!!!!! Many indieweb circles I’ve seen have an instance of it but I’m too nervous about it going down to trust it. This is awesome though – maybe I need to finally give in and learn how to self-host so I can take advantage of RSS Guard’s features…
@Tyoma:
Why not start by running it on your own computer? Any regular LAMP/WAMP/MAMP setup guide (for Linux/Windows/MacOS, respectively) should give you everything you need. Once you’ve managed to run it on your own computer, you’ll be in a more-confident place to consider trying to set it up on a different one.
Then, you can try selfhosting on something in your own house (that you don’t mind leaving “on” most of the time!), like a Raspberry Pi. Same deal! Just take one step at a time. Don’t worry about each step until the last one is done and you’re happy.
Then, once you’re confident it’s behaving right, you can reconfigure your router so that traffic comes in to your Raspberry Pi. Use an unusual port if you like. If you don’t have a static IP, look at a DDNS service. Then you’ll have everything you need to access your selfhosted RSS reader from anywhere in the world! Magical!
Just remember: one step at a time. Start with the simplest thing, then build from there.
Also: OMG, I’ve just worked out who you are: you’re tyoma.cool! I came across your site at some point and love it! Beautiful neo-retro-indieweb-awesomness!
HAHA yes it’s me!! Wow I feel like a celebrity… Your guestbook comment really warmed me heart when I read it. Definitely makes me want to keep updating my site (not that I plan on stopping!!!)
And thank you for the advice!! I have to look into that and give it a shot. Wouldn’t you know it, I already do have a home server set up, but I still barely even know how to use it outside of the pre-configured packages that I just click a button to install. I worry alot about messing something up if I start tinkering too much on there. Luckily I do have laptops laying around, free for all my tinkering!
Hello
Thanks for sharing your knowledge about FreshRSS and XPath, which is a combo I love.
But I fail on a tricky site (short url is https://doc.cerema.fr/SearchMinify/c629b1a8db79a89341a7cf9641f74162), which used to provide RSS feeds, but nomore today… Your helpful post made me compare inspector and View Source and I understood the contents I’m interested in is loaded through JavaScript “after” page has loaded. Do you know any workaround in this situation ?
Ugh. JavaScript content is the worst.
Maybe look in your browser’s Network tab, see if the content gets loaded via an “XHR” request (either as HTML, JSON, or whatever), get the URL of THAT, and use it as your starting point.
Or else you’ll need to proxy through something that can execute the JS. I’ve previously used my own tool RSSey, which you can find on Github and might still work with a few tweaks. It basically spins up a copy of Chromium, goes to an address, runs the JS, then runs code you write to exfiltrate the content. There are probably better-maintained tools out there now that do a better job than mine did, though!
Good luck!
Thanks for providing help and pointing out the network tab and the proxy idea !
Unfortunately I couldn’t identify the correct url to use (only the generic url for the “service” used as a back-end), but I manage to grab my content via a proxy service available for free… not exactly what I was aiming, but it works !