The Bullshit Web

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

The Bullshit Web (pxlnv.com)

My home computer in 1998 had a 56K modem connected to our telephone line; we were allowed a maximum of thirty minutes of computer usage a day, because my parents — quite reasonably — did not want to have their telephone shut off for an evening at a time. I remember webpages loading slowly: ten […]

My home computer in 1998 had a 56K modem connected to our telephone line; we were allowed a maximum of thirty minutes of computer usage a day, because my parents — quite reasonably — did not want to have their telephone shut off for an evening at a time. I remember webpages loading slowly: ten to twenty seconds for a basic news article.

At the time, a few of my friends were getting cable internet. It was remarkable seeing the same pages load in just a few seconds, and I remember thinking about the kinds of the possibilities that would open up as the web kept getting faster.

And faster it got, of course. When I moved into my own apartment several years ago, I got to pick my plan and chose a massive fifty megabit per second broadband connection, which I have since upgraded.

So, with an internet connection faster than I could have thought possible in the late 1990s, what’s the score now? A story at the Hill took over nine seconds to load; at Politico, seventeen seconds; at CNN, over thirty seconds. This is the bullshit web.

But first, a short parenthetical: I’ve been writing posts in both long- and short-form about this stuff for a while, but I wanted to bring many threads together into a single document that may pretentiously be described as a theory of or, more practically, a guide to the bullshit web.

A second parenthetical: when I use the word “bullshit” in this article, it isn’t in a profane sense. It is much closer to Harry Frankfurt’s definition in “On Bullshit”:

It is just this lack of connection to a concern with truth — this indifference to how things really are — that I regard as of the essence of bullshit.

I also intend it to be used in much the same sense as the way it is used in David Graeber’s “On the Phenomenon of Bullshit Jobs”:

In the year 1930, John Maynard Keynes predicted that, by century’s end, technology would have advanced sufficiently that countries like Great Britain or the United States would have achieved a 15-hour work week. There’s every reason to believe he was right. In technological terms, we are quite capable of this. And yet it didn’t happen. Instead, technology has been marshaled, if anything, to figure out ways to make us all work more. In order to achieve this, jobs have had to be created that are, effectively, pointless. Huge swathes of people, in Europe and North America in particular, spend their entire working lives performing tasks they secretly believe do not really need to be performed. The moral and spiritual damage that comes from this situation is profound. It is a scar across our collective soul. Yet virtually no one talks about it.

[…]

These are what I propose to call ‘bullshit jobs’.

What is the equivalent on the web, then?

This, this, a thousand times this. As somebody who’s watched the Web grow both in complexity and delivery speed over the last quarter century, it apalls me that somewhere along the way complexity has started to win. I don’t want to have to download two dozen stylesheets and scripts before your page begins to render – doubly-so if those additional files serve no purpose, or at least no purpose discernable to the reader. Personally, the combination of uMatrix and Ghostery is all the adblocker I need (and I’m more-than-willing to add a little userscript to “fix” your site if it tries to sabotage my use of these technologies), but when for whatever reason I turn these plugins off I feel like the Web has taken a step backwards while I wasn’t looking.

Leak in Comic Chameleon (app API hacking)

I recently discovered a minor security vulnerability in mobile webcomic reading app Comic Chameleon, and I thought that it was interesting (and tame) enough to share as a learning example of (a) how to find security vulnerabilities in an app like this, and (b) more importantly, how to write an app like this without this kind of security vulnerability.

The nature of the vulnerability is that, for webcomics pushed directly into the platform by their authors, it’s possible to read comics (long) before they’re published. By way of proof, here’s a copy of the top-right 200 × 120 pixels of episode 54 of the (excellent) Forward Comic, which Imgur will confirm was uploaded on 2 July 2018: over three months ahead of its planned publication date.

Forward Comic 0054, due for publication in October
I’m not going to spoil this comic for you, but if you follow it then when October comes I think you’ll be pleased.

How to hack a web-backed app

Just to be clear, I didn’t set out to hack this app, but once I stumbled upon the vulnerability I wanted to make sure that I was able to collect enough information that I’d be able to explain to its author what was wrong and how to fix it. You’d be amazed how many systems I find security holes in almost-completely by accident. In fact, I’d just noticed that the application supported some webcomics that I follow but for which I hadn’t been able to find RSS feeds (and so I was selfdogfooding my own tool, RSSey, to “produce” RSS feeds for my reader by screen-scraping: not the most-elegant solution). But if this app could produce a list of issues of the comic, it must have some way of doing what I was trying to do, and I wanted to know what it was.

Comic Chameleon running on Android
Comic Chameleon brings a lot of comics into a single slick Android/iOS app. Some of them you’ll even have heard of!

The app, I figured, must “phone home” to some website – probably the app’s official website itself – to get the list of comics that it supports and details of where to get their feeds from, so I grabbed a copy of the app and started investigating. Because I figured I was probably looking for a URL, the first thing I did was to download the raw APK file (your favourite search engine can tell you how to do this), decompressed it (APK files are just ZIP files, really) and ran strings on it to search for likely-looking URLs:

Running strings on the Comic Chameleon APK contents
As predicted, there are several hard-coded addresses. And all over unencrypted HTTP, eww!

I tried visiting a few of the addresses but many of them seemed to be API endpoints that were expecting additional parameters. Probably, I figured, the strings I’d extracted were prefixes to which those parameters were attached. Rather than fuzz for the right parameters, I decided to watch what the app did: I spun up a simulated Android device using the official emulator (I could have used my own on a wireless network that I control, of course, but this was lazier) and ran my favourite packet sniffer to see what the application was requesting.

Wireshark output showing Comic Chameleon traffic.
The web addresses are even clearer, here, and include all of the parameters I need.

Now I had full web addresses with parameters. Comparing the parameters that appeared when I clicked different comics revealed that each comic in the “full list” was assigned a numeric ID which was used when requesting issues of that comic (along with an intermediate stage where the year of publication is requested).

Comic Chameleon comic list XML
Each comic is assigned an ID number, probably sequentially.

Interestingly, a number of comics were listed with the attribute s="no-show" and did not appear in the app: it looked like comics that weren’t yet being made available via the app were already being indexed and collected by its web component, and for some reason were being exposed via the XML API: presumably the developer had never considered that anybody but their app would look at the XML itself, but the thing about the Web is that if you put it on the Web, anybody can see it.

Still: at this point I assumed that I was about to find what I was looking for – some kind of machine-readable source (an RSS feed or something like one) for a webcomic or two. But when I looked at the XML API for one of those webcomics I discovered quite a bit more than I’d bargained on finding:

no-shows in the episode list produced by the web component of Comic Chameleon
Hey, what’s this? This feed includes titles for webcomics that haven’t been published yet, marked as ‘no-show’…

The first webcomic I looked at included the “official” web addresses and titles of each published comic… but also several not yet published ones. The unpublished ones were marked with s="no-show" to indicate to the app that they weren’t to be shown, but I could now see them. The “official” web addresses didn’t work for me, as I’d expected, but when I tried Comic Chameleon’s versions of the addresses, I found that I could see entire episodes of comics, up to three and a half months ahead of their expected publication date.

Whoops.

Naturally, I compiled all of my findings into an email and contacted the app developer with all of the details they’d need to fix it – in hacker terms, I’m one of the “good guys”! – but I wanted to share this particular example with you because (a) it’s not a very dangerous leak of data (a few webcomics a few weeks early and/or a way to evade a few ads isn’t going to kill anybody) and (b) it’s very illustrative of the kinds of mistakes that app developers are making a lot, these days, and it’s important to understand why so that you’re not among them. On to that in a moment.

Responsible disclosure

Because (I’d like to think) I’m one of the “good guys” in the security world, the first thing I did after the research above was to contact the author of the software. They didn’t seem to have a security.txt file, a disclosure policy, nor a profile on any of the major disclosure management sites, so I sent an email. Were the security issue more-severe, I’d have sent a preliminary email suggesting (and agreeing on a mechanism for) encrypted email, but given the low impact of this particular issue, I just explained the entire issue in the initial email: basically what you’ve read above, plus some tips on fixing the issue and an offer to help out.

"Hacking", apparently
This is what stock photo sites think “hacking” is. Well… this, pages full of green code, or hoodies.

I subscribe to the doctrine of responsible disclosure, which – in the event of more-significant vulnerabilities – means that after first contacting the developer of an insecure system and giving them time to fix it, it’s acceptable (in fact: in significant cases, it’s socially-responsible) to publish the details of the vulnerability. In this case, though, I think the whole experience makes an interesting learning example about ways in which you might begin to “black box” test an app for data leaks like this and – below – how to think about software development in a way that limits the risk of such vulnerabilities appearing in the first place.

The author of this software hasn’t given any answer to any of the emails I’ve sent over the last couple of weeks, so I’m assuming that they just plan to leave this particular leak in place. I reached out and contacted the author of Forward Comic, though, which turns out (coincidentally) to be probably the most-severely affected publication on the platform, so that he had the option of taking action before I published this blog post.

Lessons to learn

When developing an “app” (whether for the web or a desktop or mobile platform) that connects to an Internet service to collect data, here are the important things you really, really ought to do:

  1. Don’t publish any data that you don’t want the user to see.
  2. If the data isn’t for everybody, remember to authenticate the user.
  3. And for heaven’s sake use SSL, it’s not the 1990s any more.
Website message asking visitor to confirm that they're old enough.
It’s a good job that nobody on the Web would ever try to view something easily-available but which they shouldn’t, right? That’s why screens like this have always worked so well.

That first lesson’s the big one of course: if you don’t want something to be on the public Internet, don’t put it on the public Internet! The feeds I found simply shouldn’t have contained the “secret” information that they did, and the unpublished comics shouldn’t have been online at real web addresses. But aside from (or in addition to) not including these unpublished items in the data feeds, what else might our app developer have considered?

  • Encryption. There’s no excuse for not using HTTPS these days. This alone wouldn’t have prevented a deliberate effort to read the secret data, but it would help prevent it from happening accidentally (which is a risk right now), e.g. on a proxy server or while debugging something else on the same network link. It also protects the user from exposing their browsing habits (do you want everybody at that coffee shop to know what weird comics you read?) and from having content ‘injected’ (do you want the person at the next table of the coffee shop to be able to choose what you see when you ask for a comic?
  • Authentication (app). The app could work harder to prove that it’s genuinely the app when it contacts the website. No mechanism for doing this can ever be perfect, because the user hasa access to the app and can theoretically reverse-engineer it to fish the entire authentication strategy out of it, but some approaches are better than others. Sending a password (e.g. over Basic Authentication) is barely better than just using a complex web address, but using a client-side certiciate or an OTP algorithm would (in conjunction with encryption) foil many attackers.
  • Authentication (user). It’s a very-different model to the one currently used by the app, but requiring users to “sign up” to the service would reduce the risks and provide better mechanisms for tracking/blocking misusers, though the relative anonymity of the Internet doesn’t give this much strength and introduces various additional burdens both technical and legal upon the developer.

Fundamentally, of course, there’s nothing that an app developer can do to perfectly protect the data that is published to that app, because the app runs on a device that the user controls! That’s why the first lesson is the most important: if it shouldn’t be on the public Internet (yet), don’t put it on the public Internet.

Hopefully there’s a lesson for you somewhere too: about how to think about app security so that you don’t make a similar mistake, or about some of the ways in which you might test the security of an application (for example, as part of an internal audit), or, if nothing else, that you should go and read Forward, because it’s pretty cool.

Further reading

7 August 2018: I’ve now written a quick explanation about how to intercept HTTPS traffic from Android apps, for those that asked.

×

JS Oxford Indieweb presentation – Polytechnic

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

JS Oxford Indieweb presentation by Garrett Coakley (polytechnic.co.uk)

Last night I attended the always excellent JS Oxford, and as well as having my mind expanded by both Jo and Ruth’s talks (Lemmings make an excellent analogy for multi-threading, who knew!), I gave a brief talk on the Indieweb movement.
If you’ve not heard of Indieweb movement before, it’s a pu…

Last night I attended the always excellent JS Oxford, and as well as having my mind expanded by both Jo and Ruth’s talks (Lemmings make an excellent analogy for multi-threading, who knew!), I gave a brief talk on the Indieweb movement.

If you’ve not heard of Indieweb movement before, it’s a push to encourage people to claim their own bit of the web, for their identity and content, free from corporate platforms. It’s not about abandoning those platforms, but ensuring that you have control of your content if something goes wrong.

From the Indieweb site:

Your content is yours

When you post something on the web, it should belong to you, not a corporation. Too many companies have gone out of business and lost all of their users’ data. By joining the IndieWeb, your content stays yours and in your control.

You are better connected

Your articles and status messages can go to all services, not just one, allowing you to engage with everyone. Even replies and likes on other services can come back to your site so they’re all in one place.

I’ve been interested in the Indieweb for a while, after attending IndieWebCamp Brighton in 2016, and I’ve been slowly implementing Indieweb features on here ever since.

So far I’ve added rel="me" attributes to allow distributed verification, and to enable Indieauth support, h-card to establish identity, and h-entry for information discovery. Behind the scenes I’m looking at webmentions (Thanks to Perch’s first class support), and there’s the ever-eternal photo management thing I keep picking up and then running away from.

The great thing about the Indieweb is that you can implement as much or as little as you want, and it always gives you something to work on. It doesn’t matter where you start. The act of getting your own domain is the first step on a longer journey.

To that end I’m interested in organising an IndieWebCamp Oxford this year. If this sounds like something that interests you, then come find me in the Digital Oxford Slack, or on Twitter.

I’m so excited to see that there are others in Oxford who care about IndieWeb things! I’ve honestly fantasised myself about running an IndieWebCamp or Homebrew Website Club here, but let’s face it: that fantasy is more one of a world in which I had the free time for such a venture. So imagine my delight when somebody else offers to do the hard work!

AMPstinction

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

AMPstinction (adactio.com)

I’ve come to believe that the goal of any good framework should be to make itself unnecessary.
Brian said it explicitly of his PhoneGap project:
The ultimate purpose of PhoneGap is to cease to exist.
That makes total sense, especially if your code is a polyfill—those solutions are temporary by d…

When Google first unveiled AMP, its intentions weren’t clear to me. hoped that it existed purely to make itself redundant:

As well as publishers creating AMP versions of their pages in order to appease Google, perhaps they will start to ask “Why can’t our regular pages be this fast?” By showing that there is life beyond big bloated invasive web pages, perhaps the AMP project will work as a demo of what the whole web could be.

Alas, as time has passed, that hope shows no signs of being fulfilled. If anything, I’ve noticed publishers using the existence of their AMP pages as a justification for just letting their “regular” pages put on weight.

Worse yet, the messaging from Google around AMP has shifted. Instead of pitching it as a format for creating parallel versions of your web pages, they’re now also extolling the virtues of having your AMP pages be the only version you publish:

In fact, AMP’s evolution has made it a viable solution to build entire websites.

On an episode of the Dev Mode podcast a while back, AMP was a hotly-debated topic. But even those defending AMP were doing so on the understanding that it was more a proof-of-concept than a long-term solution (and also that AMP is just for news stories—something else that Google are keen to change).

But now it’s clear that the Google AMP Project is being marketed more like a framework for the future: a collection of web components that prioritise performance

You all know my feelings on AMP already, I’m sure. As Jeremy points out, our optimistic ideas that these problems might go away as AMP “made itself redundant” are turning out not to be true, and Google continues to abuse its monopoly on search to push its walled-garden further into the mainstream. Read his full article…

On This Day In 2005

I normally reserve my “on this day” posts to look back at my own archived content, but once in a while I get a moment of nostalgia for something of somebody else’s that “fell off the web”. And so I bring you something you probably haven’t seen in over a decade: Paul and Jon‘s Birmingham Egg.

Paul's lunch on this day 13 years ago: Birmingham Egg with Naan Bread
Is this honestly so different from the kind of crap that most of our circle of friends ate in 2005?

It was a simpler time: a time when YouTube was a new “fringe” site (which is probably why I don’t have a surviving copy of the original video) and not yet owned by Google, before Facebook was universally-available, and when original Web content remained decentralised (maybe we’re moving back in that direction, but I wouldn’t count on it…). And only a few days after issue 175 of the b3ta newsletter wrote:

* BIRMINGHAM EGG - Take 5 scotch eggs, cut in
  half and cover in masala sauce. Place in
  Balti dish and serve with naan and/or chips. 
  We'll send a b3ta t-shirt to anyone who cooks
  this up, eats it and makes a lovely little
  photo log / write up of their adventure.
Birmingham Egg preparation
Sure, this looks like the kind of thing that seems like a good idea when you’re a student.

Clearly-inspired, Paul said “Guess what we’re doing on Sunday?” and sure enough, he delivered. On this day 13 years ago and with the help of Jon, Liz, Siân, and Andy R, Paul whipped up the dish and presented his findings to the Internet: the original page is long-gone, but I’ve resurrected it for posterity. I don’t know if he ever got his promised free t-shirt, but he earned it: the page went briefly viral and brought joy to the world before being forgotten the following week when we all started arguing about whether 9 Songs was a good film or not.

It was a simpler time, when, having fewer responsibilities, we were able to do things like this “for the lulz”. But more than that, it was still at the tail-end of the era in which individuals putting absurd shit online was still a legitimate art form on the Web. Somewhere along the way, the Web got serious and siloed. It’s not all a bad thing, but it does mean that we’re publishing less weirdness than we were back then.

× ×

Web! What is it good for?

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

https://adactio.com/journal/9016 (adactio.com)

You can listen to an audio version of Web! What is it good for?

I have a blind spot. It’s the web.

I just can’t get excited about the prospect of building something for any particular operating system, be it desktop or mobile. I think about the potential lifespan of what would be built and e…

You can listen to an audio version of Web! What is it good for?

I have a blind spot. It’s the web.

I just can’t get excited about the prospect of building something for any particular operating system, be it desktop or mobile. I think about the potential lifespan of what would be built and end up asking myself “why bother?” If something isn’t on the web—and of the web—I find it hard to get excited about it. I’m somewhat jealous of people who can get equally excited about the web, native, hardware, print …in my mind, if it hasn’t got a URL, it’s missing some vital spark.

I know that this is a problem, but I can’t help it. At the very least, I have enough presence of mind to recognise it as being my problem.

My problem, too. There are worse problems to have.

My Blog Now Has a Content Security Policy – Here’s How I’ve Done It

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

My Blog Now Has a Content Security Policy – Here's How I've Done It (Troy Hunt)

I've long been a proponent of Content Security Policies (CSPs). I've used them to fix mixed content warnings on this blog after Disqus made a little mistake, you'll see one adorning Have I Been Pwned (HIBP) and I even wrote a dedicated Pluralsight course on browser security headers. I'm a

I’ve long been a proponent of Content Security Policies (CSPs). I’ve used them to fix mixed content warnings on this blog after Disqus made a little mistake, you’ll see one adorning Have I Been Pwned (HIBP) and I even wrote a dedicated Pluralsight course on browser security headers. I’m a fan (which is why I also recently joined Report URI), and if you’re running a website, you should be too.

But it’s not all roses with CSPs and that’s partly due to what browsers will and will not let you do and partly due to what the platforms running our websites will and will not let you do. For example, this blog runs on Ghost Pro which is a managed SaaS platform. I can upload whatever theme I like, but I can’t control many aspects of how the platform actually executes, including how it handles response headers which is how a CSP is normally served by a site. Now I’m enormously supportive of running on managed platforms, but this is one of the limitations of doing so. I also can’t add custom headers via Cloudflare at “the edge”; I’m serving the HSTS header from there because there’s first class support for that in the GUI, but not for CSP either specifically in the GUI or via custom response headers. This will be achievable in the future via Cloudflare workers but for now, they have to come from the origin site.

However, you can add a CSP via meta tag and indeed that’s what I originally did with the upgrade-insecure-requests implementation I mentioned earlier when I fixed the Disqus issue. However – and this is where we start getting into browser limitations – you can’t use the report-uri directive in a meta tag. Now that doesn’t matter if all the CSP is doing is upgrading requests, but it matters a lot if you’re actually blocking content. That’s where the real value proposition of a CSP lies too; in its ability to block things that may have been maliciously inserted into a site. I’ve had enough experience with breaking the CSP on HIBP to know that reporting is absolutely invaluable and indeed when I’ve not paid attention to reports in the past, it’s literally cost me money.

Improving URLs for AMP pages

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Improving URLs for AMP pages (Accelerated Mobile Pages Project)

TL;DR: We are making changes to how AMP works in platforms such as Google Search that will enable linked pages to appear under publishers’ URLs instead of the google.com/amp URL space while maintai…

TL;DR: We are making changes to how AMP works in platforms such as Google Search that will enable linked pages to appear under publishers’ URLs instead of the google.com/amp URL space while maintaining the performance and privacy benefits of AMP Cache serving.

When we first launched AMP in Google Search we made a big trade-off: to achieve the user experience that users were telling us that they wanted, instant loading, we needed to start loading the page before the user clicked. As we detailed in a deep-dive blog post last year,  privacy reasons make it basically impossible to load the page from the publisher’s server. Publishers shouldn’t know what people are interested in until they actively go to their pages. Instead, AMP pages are loaded from the Google AMP Cache but with that behavior the URLs changed to include the google.com/amp/ URL prefix.

We are huge fans of meaningful URLs ourselves and recognize that this isn’t ideal. Many of y’all agree. It is certainly the #1 piece of feedback we hear about AMP. We sought to ensure that these URLs show up in as few places as possible. Over time our Google Search native apps on Android and iOS started defaulting to showing the publishers URLs and we worked with browser vendors to share the publisher’s URL of an article where possible. We couldn’t, however, fix the state of URLs where it matters most: on the web and the browser URL bar.

Regular readers may recall that I’ve complained about AMP. This latest announcement by the project lead of the AMP team at Google goes some way to solving the worst of the problems with the AMP project, but it still leaves a lot to be desired: for example, while Google still favours AMP pages in search results they’re building a walled garden and penalising people who don’t choose to be inside it, and it’s a walled garden with fewer features than the rest of the web and a lock-in effect once you’re there. We’ve seen this before with “app culture” and with Facebook, but Google have the power to do a huge amount more damage.

Enabling IPv6 Support in nginx

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

This is going to be a really short post, but for someone it could save an hour of life.

So, you’ve nothing to do and you’ve decided to play around with IPv6 or maybe you’re happened to be an administrator of a web service that needs to support IPv6 connectivity and you need to make your nginx server work nicely with this protocol.

First thing you need to do is to enable IPv6 in nginx by recompiling it with --with-ipv6 configure option and reinstalling it. If you use some pre-built package, check if your nginx already has this key enabled by running nginx -V.

The Web began dying in 2014, here’s how

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Before the year 2014, there were many people using Google, Facebook, and Amazon. Today, there are still many people using services from those three tech giants (respectively, GOOG, FB, AMZN). Not much has changed, and quite literally the user interface and features on those sites has remained mostly untouched. However, the underlying dynamics of power on the Web have drastically changed, and those three companies are at the center of a fundamental transformation of the Web.

It looks like nothing changed since 2014, but GOOG and FB now have direct influence over 70%+ of internet traffic.

Internet activity itself hasn’t slowed down. It maintains a steady growth, both in amount of users and amount of websites…

Fnorders

I’ve not posted much recently: I’ve had a lot of Complicated Life Stuff going on, sorry.

But I did make a thing: fnorders.com. You’re welcome.

Fnord.

Sub-Pixel Problems in CSS

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Something that jumped at me, recently, was a rendering dilemma that browsers have to encounter, and gracefully handle, on a day-by-day basis with little, to no, standardization.

Take the following page for example. You have 4 floated divs, each with a width of 25%, contained within a parent div of width 50px. Here’s the question: How wide are each of the divs?

The problem lies in the fact that each div should be, approximately, 12.5px wide and since technology isn’t at a level where we can start rendering at the sub-pixel level we tend to have to round off the number. The problem then becomes: Which way do you round the number? Up, down, or a mixture of the two? I think the results will surprise you, as they did me…