The Underground Blog

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

theunderground.blog is an experimental blog that is only available to read through a feed reader.

If you would like to read the latest posts, you can subscribe to the feed at https://theunderground.blog/feed.xml, using the feed reader of your choice.

Chris first suggested this idea in the footnote of a post that talks about something I’ve been witnessing recently: that blogging seems to be having a renaissance1. I’ve for a few years been telling people that now is the second-best time to start a blog. The best time was, of course, ~20 years ago, but if you missed out first time around (or let your blog die as big social media silos took over): now’s the time to join the growing resurgence!

Anyway, he only went and actually did it! The newest member of RSS Club is likely to be… an entire blog that’s only accessible via a feed reader2.

There’s two posts published so far, and if you want to read them you’ll need to subscribe to theunderground.blog using your feed reader. There’s tips on that page on getting an easy-to-use one if you haven’t already.

Footnotes

1 He also had interesting things to say about OPML, which is a topic close to my heart. I wonder if I ought to start sharing a partial OPML file of my subscriptions?

2 Or by reading the source code, I suppose: on the open Web, that’s always an option. The Web is, indeed, magical.

Getting Twitter Avatars (without the Twitter API)

Among Twitter’s growing list of faults over the years are various examples of its increasing divergence from open Web standards and developer-friendly endpoints. Do you remember when you used to be able to subscribe to somebody’s feed by RSS? When you could see who follows somebody without first logging in? When they were still committed to progressive enhancement and didn’t make your browser download ~5MB of Javascript or else not show any content whatsoever? Feels like a long time ago, now.

Lighthouse Performance score for Twitter's Twitter account page on mobile, scoring 50%.
For one of the most-popular 50 websites in the world, this score is frankly shameful.

But those complaints aside, the thing that bugged me most this week was how much harder they’ve made it to programatically get access to things that are publicly accessible via web pages. Like avatars, for example!

If you’re a human and you want to see the avatar image associated with a given username, you can go to twitter.com/that-username and – after you’ve waited a bit for all of the mandatory JavaScript to download and run (I hope you’re not on a metered connection!) – you’ll see a picture of the user, assuming they’ve uploaded one and not made their profile private. Easy.

If you’re a computer and you want to get the avatar image, it used to be just as easy; just go to twitter.com/api/users/profile_image/that-username and you’d get the image. This was great if you wanted to e.g. show a Facebook-style facepile of images of people who’d retweeted your content.

But then Twitter removed that endpoint and required that computers log in to Twitter, so a clever developer made a service that fetched avatars for you if you went to e.g. twivatar.glitch.com/that-username.

But then Twitter killed that, too. Because despite what they claimed 5½ years ago, Twitter still clearly hates developers.

Dan Q's Twitter profile header showing his avatar image.
You want to that image? Well you’ll need a Twitter account, a developer account, an OAuth token set, a stack of code…

Recently, I needed a one-off program to get the avatars associated with a few dozen Twitter usernames.

First, I tried the easy way: find a service that does the work for me. I’d used avatars.io before but it’s died, presumably because (as I soon discovered) Twitter had made things unnecessarily hard for them.

Second, I started looking at the Twitter API documentation but it took me in the region of 30-60 seconds before I said “fuck that noise” and decided that the set-up overhead in doing things the official way simply wasn’t justified for my simple use case.

So I decided to just screen-scrape around the problem. If a human can just go to the web page and see the image, a computer pretending to be a human can do exactly the same. Let’s do this:

/* Copyright (c) 2021 Dan Q; released under the MIT License. */

const Puppeteer = require('puppeteer');

getAvatar = async (twitterUsername) => {
  const browser = await Puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
  const page = await browser.newPage();
  await page.goto(`https://twitter.com/${twitterUsername}`);
  await page.waitForSelector('a[href$="/photo"] img[src]');
  const url = await page.evaluate(()=>document.querySelector('a[href$="/photo"] img').src);
  await browser.close();
  console.log(`${twitterUsername}: ${url}`);
};

process.argv.slice(2).forEach( twitterUsername => getAvatar( twitterUsername.toLowerCase() ) );
The code is ludicrously simple. It took less time, energy, and code to write this than to follow Twitter’s “approved” procedure. You can download the code via Gist.

Obviously, using this code would violate Twitter’s terms of use for automation, so… don’t, I guess?

Given that I only needed to run it once, on a finite list of accounts, I maintain that my approach was probably kinder on their servers than just manually going to every page and saving the avatar from it. But if you set up a service that uses this approach then you’ll certainly piss off somebody at Twitter and history shows that they’ll take their displeasure out on you without warning.

$ node get-twitter-avatar.js alexsdutton richove geohashing TailsteakAD LilFierce1 ninjanails
alexsdutton: https://pbs.twimg.com/profile_images/740505937039986688/F9gUV0eK_200x200.jpg
lilfierce1: https://pbs.twimg.com/profile_images/1189417235313561600/AZ2eLjAg_200x200.jpg
richove: https://pbs.twimg.com/profile_images/1576438972/2011_My_picture4_200x200.jpeg
geohashing: https://pbs.twimg.com/profile_images/877137707939581952/POzWWV2d_200x200.jpg
ninjanails: https://pbs.twimg.com/profile_images/1146364466801577985/TvCfb49a_200x200.jpg
tailsteakad: https://pbs.twimg.com/profile_images/1118738807019278337/y5WWkLbF_200x200.jpg
This output shows the avatar URLs of a half a dozen Twitter accounts. It took minutes to write the code and takes seconds to run, but if I’d have done it the “right” way I’d still be unnecessarily wading through Twitter’s sprawling documentation.

But it works. It was fast and easy and I got what I was looking for.

And the moral of the story is: if you make an API and it’s terrible, don’t be surprised if people screen-scape your service instead. (You can’t spell “scraping” without “API”, amirite?)

Lighthouse Performance score for Twitter's Twitter account page on mobile, scoring 50%.× Dan Q's Twitter profile header showing his avatar image.×

Reply to “I was permanently banned by Facebook”

Donncha Ó Caoimh said:

My Facebook account was permanently banned on Wednesday along with all the people who take care of the Cork Skeptics page. We’re still not sure why but it might have something to do with the Facebook algorithm used to detect far-right conspiracy groups.

If you have a Facebook account you should download your information too because it could happen to you too, even though you did nothing wrong. Go here and click the “Create File” button now.

Yeah, I know you won’t do it but you really should.

Great advice.

After I got banned from Facebook in 2011 (for using a “fake name”, which is actually my real name) I took a similar line of thinking: I can’t trust Facebook (or Twitter, or Instagram, or whoever else) to be responsible custodians of my content, so I shan’t. Now, virtually all content I create is hosted on my WordPress-powered blog, at my own domain, first and foremost… and syndicated copies may appear on various social media.

In a very few instances I go the other way around, producing content in silos and then copying it back to my blog: e.g. my geocaching/geohashing expeditions are posted first to their respective sites (because it’s easiest and most-practical to do that using their apps, especially “in the field”), but then they get imported into my blog using a custom plugin. If any of these sites closes, deletes my data, adds paid tiers I’m not happy with, or just bans me from my own account… I’m still set.

Backing up all your social content is a good strategy. Owning it all to begin with is an even better one, IMHO. See also: Indieweb.

Killed by Google – The Google Graveyard & Cemetery

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Fusion Tables… Fabric… Inbox… Google+… goo.gl… Goggles… Site Search… Glass… Now… Code… Bump!… Gears… Desktop Search…

Just some of the projects and services that Google has offered and then killed; this site aims to catalogue them all. Some, like Wave, were given to the community (Wave lived on for a while as an Apache project but is now basically dead), but most, like Reader, were assassinated in a misguided attempt to drive traffic to other services (ultimately, Reader was killed perhaps to try to get people onto Google+, which was then also killed).

Google can’t be trusted to maintain the services of theirs that you depend upon (relevant XKCD?). That’s not a phenomenon that’s unique to Google, of course: it’s perhaps just that they produce so many new and often-experimental services that they inevitably cease supporting more of them than some of the many other providers who’ve killed the silos that people depended upon.

How could things be better? For a start, Google could make a better commitment to open-source and developing standards rather than platforms. But if you don’t think you can trust them to do that – and you can’t – then the only solution for individuals is to use fewer Google products to break the Google-monoculture. Encourage the competition to weaken their position, and break free from silos in general where it’s possible to do so.

148+ projects and services dead. But hey, we’re getting Stadia so everything’s okay, right? <sigh>