The Bullshit Web

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

The Bullshit Web (pxlnv.com)

My home computer in 1998 had a 56K modem connected to our telephone line; we were allowed a maximum of thirty minutes of computer usage a day, because my parents — quite reasonably — did not want to have their telephone shut off for an evening at a time. I remember webpages loading slowly: ten […]

My home computer in 1998 had a 56K modem connected to our telephone line; we were allowed a maximum of thirty minutes of computer usage a day, because my parents — quite reasonably — did not want to have their telephone shut off for an evening at a time. I remember webpages loading slowly: ten to twenty seconds for a basic news article.

At the time, a few of my friends were getting cable internet. It was remarkable seeing the same pages load in just a few seconds, and I remember thinking about the kinds of the possibilities that would open up as the web kept getting faster.

And faster it got, of course. When I moved into my own apartment several years ago, I got to pick my plan and chose a massive fifty megabit per second broadband connection, which I have since upgraded.

So, with an internet connection faster than I could have thought possible in the late 1990s, what’s the score now? A story at the Hill took over nine seconds to load; at Politico, seventeen seconds; at CNN, over thirty seconds. This is the bullshit web.

But first, a short parenthetical: I’ve been writing posts in both long- and short-form about this stuff for a while, but I wanted to bring many threads together into a single document that may pretentiously be described as a theory of or, more practically, a guide to the bullshit web.

A second parenthetical: when I use the word “bullshit” in this article, it isn’t in a profane sense. It is much closer to Harry Frankfurt’s definition in “On Bullshit”:

It is just this lack of connection to a concern with truth — this indifference to how things really are — that I regard as of the essence of bullshit.

I also intend it to be used in much the same sense as the way it is used in David Graeber’s “On the Phenomenon of Bullshit Jobs”:

In the year 1930, John Maynard Keynes predicted that, by century’s end, technology would have advanced sufficiently that countries like Great Britain or the United States would have achieved a 15-hour work week. There’s every reason to believe he was right. In technological terms, we are quite capable of this. And yet it didn’t happen. Instead, technology has been marshaled, if anything, to figure out ways to make us all work more. In order to achieve this, jobs have had to be created that are, effectively, pointless. Huge swathes of people, in Europe and North America in particular, spend their entire working lives performing tasks they secretly believe do not really need to be performed. The moral and spiritual damage that comes from this situation is profound. It is a scar across our collective soul. Yet virtually no one talks about it.

[…]

These are what I propose to call ‘bullshit jobs’.

What is the equivalent on the web, then?

This, this, a thousand times this. As somebody who’s watched the Web grow both in complexity and delivery speed over the last quarter century, it apalls me that somewhere along the way complexity has started to win. I don’t want to have to download two dozen stylesheets and scripts before your page begins to render – doubly-so if those additional files serve no purpose, or at least no purpose discernable to the reader. Personally, the combination of uMatrix and Ghostery is all the adblocker I need (and I’m more-than-willing to add a little userscript to “fix” your site if it tries to sabotage my use of these technologies), but when for whatever reason I turn these plugins off I feel like the Web has taken a step backwards while I wasn’t looking.

Leak in Comic Chameleon (app API hacking)

I recently discovered a minor security vulnerability in mobile webcomic reading app Comic Chameleon, and I thought that it was interesting (and tame) enough to share as a learning example of (a) how to find security vulnerabilities in an app like this, and (b) more importantly, how to write an app like this without this kind of security vulnerability.

The nature of the vulnerability is that, for webcomics pushed directly into the platform by their authors, it’s possible to read comics (long) before they’re published. By way of proof, here’s a copy of the top-right 200 × 120 pixels of episode 54 of the (excellent) Forward Comic, which Imgur will confirm was uploaded on 2 July 2018: over three months ahead of its planned publication date.

Forward Comic 0054, due for publication in October
I’m not going to spoil this comic for you, but if you follow it then when October comes I think you’ll be pleased.

How to hack a web-backed app

Just to be clear, I didn’t set out to hack this app, but once I stumbled upon the vulnerability I wanted to make sure that I was able to collect enough information that I’d be able to explain to its author what was wrong and how to fix it. You’d be amazed how many systems I find security holes in almost-completely by accident. In fact, I’d just noticed that the application supported some webcomics that I follow but for which I hadn’t been able to find RSS feeds (and so I was selfdogfooding my own tool, RSSey, to “produce” RSS feeds for my reader by screen-scraping: not the most-elegant solution). But if this app could produce a list of issues of the comic, it must have some way of doing what I was trying to do, and I wanted to know what it was.

Comic Chameleon running on Android
Comic Chameleon brings a lot of comics into a single slick Android/iOS app. Some of them you’ll even have heard of!

The app, I figured, must “phone home” to some website – probably the app’s official website itself – to get the list of comics that it supports and details of where to get their feeds from, so I grabbed a copy of the app and started investigating. Because I figured I was probably looking for a URL, the first thing I did was to download the raw APK file (your favourite search engine can tell you how to do this), decompressed it (APK files are just ZIP files, really) and ran strings on it to search for likely-looking URLs:

Running strings on the Comic Chameleon APK contents
As predicted, there are several hard-coded addresses. And all over unencrypted HTTP, eww!

I tried visiting a few of the addresses but many of them seemed to be API endpoints that were expecting additional parameters. Probably, I figured, the strings I’d extracted were prefixes to which those parameters were attached. Rather than fuzz for the right parameters, I decided to watch what the app did: I spun up a simulated Android device using the official emulator (I could have used my own on a wireless network that I control, of course, but this was lazier) and ran my favourite packet sniffer to see what the application was requesting.

Wireshark output showing Comic Chameleon traffic.
The web addresses are even clearer, here, and include all of the parameters I need.

Now I had full web addresses with parameters. Comparing the parameters that appeared when I clicked different comics revealed that each comic in the “full list” was assigned a numeric ID which was used when requesting issues of that comic (along with an intermediate stage where the year of publication is requested).

Comic Chameleon comic list XML
Each comic is assigned an ID number, probably sequentially.

Interestingly, a number of comics were listed with the attribute s="no-show" and did not appear in the app: it looked like comics that weren’t yet being made available via the app were already being indexed and collected by its web component, and for some reason were being exposed via the XML API: presumably the developer had never considered that anybody but their app would look at the XML itself, but the thing about the Web is that if you put it on the Web, anybody can see it.

Still: at this point I assumed that I was about to find what I was looking for – some kind of machine-readable source (an RSS feed or something like one) for a webcomic or two. But when I looked at the XML API for one of those webcomics I discovered quite a bit more than I’d bargained on finding:

no-shows in the episode list produced by the web component of Comic Chameleon
Hey, what’s this? This feed includes titles for webcomics that haven’t been published yet, marked as ‘no-show’…

The first webcomic I looked at included the “official” web addresses and titles of each published comic… but also several not yet published ones. The unpublished ones were marked with s="no-show" to indicate to the app that they weren’t to be shown, but I could now see them. The “official” web addresses didn’t work for me, as I’d expected, but when I tried Comic Chameleon’s versions of the addresses, I found that I could see entire episodes of comics, up to three and a half months ahead of their expected publication date.

Whoops.

Naturally, I compiled all of my findings into an email and contacted the app developer with all of the details they’d need to fix it – in hacker terms, I’m one of the “good guys”! – but I wanted to share this particular example with you because (a) it’s not a very dangerous leak of data (a few webcomics a few weeks early and/or a way to evade a few ads isn’t going to kill anybody) and (b) it’s very illustrative of the kinds of mistakes that app developers are making a lot, these days, and it’s important to understand why so that you’re not among them. On to that in a moment.

Responsible disclosure

Because (I’d like to think) I’m one of the “good guys” in the security world, the first thing I did after the research above was to contact the author of the software. They didn’t seem to have a security.txt file, a disclosure policy, nor a profile on any of the major disclosure management sites, so I sent an email. Were the security issue more-severe, I’d have sent a preliminary email suggesting (and agreeing on a mechanism for) encrypted email, but given the low impact of this particular issue, I just explained the entire issue in the initial email: basically what you’ve read above, plus some tips on fixing the issue and an offer to help out.

"Hacking", apparently
This is what stock photo sites think “hacking” is. Well… this, pages full of green code, or hoodies.

I subscribe to the doctrine of responsible disclosure, which – in the event of more-significant vulnerabilities – means that after first contacting the developer of an insecure system and giving them time to fix it, it’s acceptable (in fact: in significant cases, it’s socially-responsible) to publish the details of the vulnerability. In this case, though, I think the whole experience makes an interesting learning example about ways in which you might begin to “black box” test an app for data leaks like this and – below – how to think about software development in a way that limits the risk of such vulnerabilities appearing in the first place.

The author of this software hasn’t given any answer to any of the emails I’ve sent over the last couple of weeks, so I’m assuming that they just plan to leave this particular leak in place. I reached out and contacted the author of Forward Comic, though, which turns out (coincidentally) to be probably the most-severely affected publication on the platform, so that he had the option of taking action before I published this blog post.

Lessons to learn

When developing an “app” (whether for the web or a desktop or mobile platform) that connects to an Internet service to collect data, here are the important things you really, really ought to do:

  1. Don’t publish any data that you don’t want the user to see.
  2. If the data isn’t for everybody, remember to authenticate the user.
  3. And for heaven’s sake use SSL, it’s not the 1990s any more.
Website message asking visitor to confirm that they're old enough.
It’s a good job that nobody on the Web would ever try to view something easily-available but which they shouldn’t, right? That’s why screens like this have always worked so well.

That first lesson’s the big one of course: if you don’t want something to be on the public Internet, don’t put it on the public Internet! The feeds I found simply shouldn’t have contained the “secret” information that they did, and the unpublished comics shouldn’t have been online at real web addresses. But aside from (or in addition to) not including these unpublished items in the data feeds, what else might our app developer have considered?

  • Encryption. There’s no excuse for not using HTTPS these days. This alone wouldn’t have prevented a deliberate effort to read the secret data, but it would help prevent it from happening accidentally (which is a risk right now), e.g. on a proxy server or while debugging something else on the same network link. It also protects the user from exposing their browsing habits (do you want everybody at that coffee shop to know what weird comics you read?) and from having content ‘injected’ (do you want the person at the next table of the coffee shop to be able to choose what you see when you ask for a comic?
  • Authentication (app). The app could work harder to prove that it’s genuinely the app when it contacts the website. No mechanism for doing this can ever be perfect, because the user hasa access to the app and can theoretically reverse-engineer it to fish the entire authentication strategy out of it, but some approaches are better than others. Sending a password (e.g. over Basic Authentication) is barely better than just using a complex web address, but using a client-side certiciate or an OTP algorithm would (in conjunction with encryption) foil many attackers.
  • Authentication (user). It’s a very-different model to the one currently used by the app, but requiring users to “sign up” to the service would reduce the risks and provide better mechanisms for tracking/blocking misusers, though the relative anonymity of the Internet doesn’t give this much strength and introduces various additional burdens both technical and legal upon the developer.

Fundamentally, of course, there’s nothing that an app developer can do to perfectly protect the data that is published to that app, because the app runs on a device that the user controls! That’s why the first lesson is the most important: if it shouldn’t be on the public Internet (yet), don’t put it on the public Internet.

Hopefully there’s a lesson for you somewhere too: about how to think about app security so that you don’t make a similar mistake, or about some of the ways in which you might test the security of an application (for example, as part of an internal audit), or, if nothing else, that you should go and read Forward, because it’s pretty cool.

Further reading

7 August 2018: I’ve now written a quick explanation about how to intercept HTTPS traffic from Android apps, for those that asked.

×

Writing A Calendar App In Rails Vs. PHP

Some time ago, I wrote a web-based calendar application in PHP, one of my favourite programming languages. This tool would produce a HTML tabular calendar for a four week period, Monday to Sunday, in which the current date (or a user-specified date) fell in the second week (so you’re looking at this week, last week, and two weeks in the future). The user-specified date, for various reasons, would be provided as the number of seconds since the epoch (1970). In addition, the user must be able to flick forwards and backwards through the calendar, “shifting” by one or four weeks each time.

Part of this algorithm, of course, was responsible for finding the timestamp (seconds since the epoch) of the beginning of “a week last Monday”, GMT. It went something like this (pseudocode):

1. Get a handle on the beginning of "today" with [specified time] modulus [number of seconds in day]
2. Go back in time a week by deducting [number of seconds in day] multiplied by [number of days in week] (you can see I'm a real programmer, because I set "number of days in week" as a constant, in case it ever gets changed)
3. Find the previous Monday by determining what day of the week this date is on (clever functions in PHP do this for me), then take [number of seconds in day] multiplied by [number of days after Monday we are] from this to get "a week last Monday"
4. Jump forwards or backwards a number of weeks specified by the user, if necessary. Easy.
5. Of course, this isn't perfect, because this "shift backwards a week and a few days" might have put us in to "last month", in which case the calendar needs to know to deduct one month and add [number of days in last month]
6. And if we just went "back in time" beyond January, we also need to deduct a year and add 11 months. Joy.

So; not the nicest bit of code in the world.

I’ve recently been learning to program in Ruby On Rails. Ruby is a comparatively young language which has become quite popular in Japan but has only had reasonable amounts of Westernised documentation for the last four years or so. I started looking into it early this year after reading an article that compared it to Python. Rails is a web application development framework that sits on top of Ruby and promises to be “quick and structured”, becoming the “best of both worlds” between web engineering in PHP (quick and sloppy) and in Java (slow and structured). Ruby is a properly object-oriented language – even your literals are objects – and Rails takes full advantage of this.

For example, here’s my interpretation in Rails of the same bit of code as above:

@week_last_monday = 7.days.ago.gmtime.monday + params[:weeks].to_i.weeks

An explanation:

  • @week_last_monday is just a variable in which I’m keeping the result of my operation.
  • 7.days might fool you. Yes, what I’m doing there is instantiating an Integer (7, actually a Fixint, but who cares), then calling the “days” function on it, which returns me an instance of Time which represents 7 days of time.
  • Calling the ago method on my Time object, which returns me another Time object, this time one which is equal to Time.now (the time right now) minus the amount of Time I already had (7 days). Basically, I now have a handle on “7 days ago”.
  • The only thing PHP had up on me here is that it’s gmdate() function had ensured I already had my date/time in GMT; here, I have to explicitly call gmtime to do the same thing.
  • And then I simply call monday on my resulting Time object to get a handle on the beginning of the previous Monday. That simple. 24 characters of fun.
  • + params[:weeks].to_i.weeks simply increments (or decrements) the Time I have by a number of weeks specified by the user (params[:weeks] gets the number of weeks specified, to_i converts it to an integer, and weeks, like days, creates a Time object from this. In Ruby, object definitions can even override operators like +, -, <, >, etc., as if they were methods (because they are), and so the author of the Time class made it simple to perform arithmetic upon times and dates.

This was the very point at which I feel in love with Ruby on Rails.

Reply #13108

Sian wrote:

Going to be registering a website thingy tonight to mess around with. Any hints/tips/advice from all you people who know about this stuff would be gratefully received. I am, after all, officially computer illiterate.

Register your domain name with somebody respectable (won’t rip you off or otherwise fuck up) like Easily, who’ll give you a domain name (whateveryoulike.co.uk) for as little as £9.99/2 years.

As far as hosting is concerned, I can’t say a bad word about the fantastic DreamHost, who now provide hosting for me, Paul, Claire, Matt (from SmartData), JTA & Ruth, Statto… etc. etc.

I’m not sure if it still works, but if you sign up for their Crazy Domain Insane offer ($9.95/month), paying for the first year up-front, and use the promo code “777”, they’ll give you the first YEAR for the price of the first month. Which is nice. And as it includes a free .com domain name of your choice, that’s pretty fab, too (saves you heaps of cash, no commitment to stay with them more than a year anyway, etc.). They’re pretty damn good.

Drop me an e-mail if you want any specific help/advice on such geekibits. Will see what I can do.

An Idea – How To Get Treeware Junk Mail Banned

Here’s a thought: a way to try to get unsolicited (treeware) junk mail banned –

Every time you receive a bit of junk mail, just go and put it back in the post box: it’s almost all franked mail, and so the post office will re-sort it and deliver it back to you. Put a tally on the reverse side, and add one to it each time you forward it to yourself. If enough people did it, I wonder how many recursions you’d need to put through the post office before the postal workers union petitioned the government to disallow the sending of unsolicited treeware junk mail.

Not sure if it’d work, but I think I’ll do it anyway, just out of curiosity about how high a tally I can get before the post office start refusing to re-deliver them. Heh.

Got my Dad’s web site done. Just waiting for the domain name registration to go through so I can deploy it.