Visitor Tracking Without Cookies (or How To Abuse HTTP 301s)

Last week I was talking to Alexander Dutton about an idea that we had to implement cookie-like behaviour using browser caching. As I first mentioned last year, new laws are coming into force across Europe that will require websites to ask for your consent before they store cookies on your computer. Regardless of their necessity, these laws are badly-defined and ill thought-out, and there’s been a significant lack of information to support web managers in understanding and implementing the required changes.

British Telecom's implementation of the new cookie laws. Curiously, if you visit their site using the Opera web browser, it assumes that you've given consent, even if you click the button to not do so.
British Telecom’s implementation of the new cookie laws. Curiously, if you visit their site using the Opera web browser, it assumes that you’ve given consent, even if you click the button to not do so.

To illustrate one of the ambiguities in the law, I’ve implemented a tool which tracks site visitors almost as effectively as cookies (or similar technologies such as Flash Objects or Local Storage), but which must necessarily fall into one of the larger grey areas. My tool abuses the way that “permanent” (301) HTTP redirects are cached by web browsers.

[callout][button link=”http://c301.scatmania.org/” align=”right” size=”medium” color=”green”]See Demo Site[/button]You can try out my implementation for yourself. Click on the button to see the sample site, then close down all of your browser windows (or even restart your computer) and come back and try again: the site will recognise you and show you the same random number as it did the first time around, as well as identifying when your first visit was.[/callout]

Here’s how it works, in brief:

  1. A user visits the website.
  2. The website contains a <script> tag, pointing at a URL where the user’s browser will find some Javascript.
  3. The user’s browser requests the Javascript file.
  4. The server generates a random unique identifier for this user.
  5. The server uses a HTTP 301 response to tell the browser “this Javascript can be found at a different web address,” and provides an address that contains the new unique identifier.
  6. The user’s browser requests the new document (e.g. /javascripts/tracking/123456789.js, if the user’s unique ID was 123456789).
  7. The resulting Javascript is generated dynamically to automatically contain the ID in a variable, which can then be used for tracking purposes.
  8. Subsequent requests to the server, even after closing the browser, skip steps 3 through 5, because the user’s browser will cache the 301 and re-use the unique web address associated with that individual user.
How my "301-powered 'cookies'" work.
How my “301-powered ‘cookies'” work.

Compared to conventional cookie-based tracking (e.g. Google Analytics), this approach:

  • Is more-fragile (clearing the cache is a more-common user operation than clearing cookies, and a “force refresh” may, in some browsers, result in a new tracking ID being issued).
  • Is less-blockable using contemporary privacy tools, including the W3C’s proposed one: it won’t be spotted by any cookie-cleaners or privacy filters that I’m aware of: it won’t penetrate incognito mode or other browser “privacy modes”, though.

Moreover, this technique falls into a slight legal grey area. It would certainly be against the spirit of the law to use this technique for tracking purposes (although it would be trivial to implement even an advanced solution which “proxied” requests, using a database to associate conventional cookies with unique IDs, through to Google Analytics or a similar solution). However, it’s hard to legislate against the use of HTTP 301s, which are an even more-fundamental and required part of the web than cookies are. Also, and for the same reasons, it’s significantly harder to detect and block this technique than it is conventional tracking cookies. However, the technique is somewhat brittle and it would be necessary to put up with a reduced “cookie lifespan” if you used it for real.

[callout][button link=”http://c301.scatmania.org/” align=”right” size=”medium” color=”green”]See Demo Site[/button] [button link=”https://gist.github.com/avapoet/5318224″ align=”right” size=”medium” color=”orange”]Download Code[/button] Please try out the demo, or download the source code (Ruby/Sinatra) and see for yourself how this technique works.[/callout]

Note that I am not a lawyer, so I can’t make a statement about the legality (or not) of this approach to tracking. I would suspect that if you were somehow caught doing it without the consent of your users, you’d be just as guilty as if you used a conventional approach. However, it’s certainly a technically-interesting approach that might have applications in areas of legitimate tracking, too.

Update: The demo site is down, but I’ve update the download code link so that it still works.

×

Leading By Example

This week, I was reading the new EU legislation [PDF] which relates to, among other things, the way that websites are allowed to use HTTP cookies (and similar technologies) to track their users. The Information Commissioner’s Office has released a statement to ask website owners to review their processes in advance of the legislation coming into effect later this month, but for those of you who like the big-print edition with pictures, here’s the short of it:

From 26th May, a website must not give you a cookie unless it’s either (a) an essential (and implied) part of the functionality of the site, or (b) you have opted-in to it. This is a stark change from the previous “so long as you allow opt-outs, it’s okay” thinking of earlier legislation, and large organisations (you know, like the one I now work for) in particular are having to sit up and pay attention: after all, they’re the ones that people are going to try to sue.

The legislation is surprisingly woolly on some quite important questions. Like… who has liability for ensuring that a user has opted-in to third-party cookies (e.g. Google Analytics)? Is this up to the web site owner or to the third party? What about when a site represents companies both in and outside the EU? And so on.

Seeking guidance, I decided to browse the website of the Information Commissioner’s Office. And guess what I found…

Hey! I didn't opt-in to any of these cookies, Mr. Information Commissioner!

…not what I was looking for: just more circular and woolly thinking. But I did find that the ICO themselves does not comply with the guidance that they themselves give. Upon arriving at their site – and having never been asked for my consent – I quickly found myself issued with five different cookies (with lifespans of up to two years!). I checked their privacy policy, and found a mention of the Google Analytics cookie they use, but no indication about the others (presumably they’re not only “opt-out”, but also “secret”). What gives, guys?

Honestly: I’m tempted to assume that only this guy has the right approach. I’m all in favour of better cookie law, but can’t we wait until after the technological side (in web browsers) is implemented before we have to fix all of our websites? Personally, I thought that P3P policies (remember when those were all the rage?) had a lot of potential, properly-implemented, because they genuinely put the power into the hands of the users. The specification wasn’t perfect, but if it had have been, we wouldn’t be in the mess we are now. Perhaps it’s time to dig it up, fix it, and then somehow explain it to the politicians.

×

The Right To Read

[this post was lost during a server failure on Sunday 11th July 2004; it was partially recovered on 21st March 2012]

If you haven’t already read it, take a look at The Right To Read, a very short story written in 1997 and updated in 2002 – it’ll only take you a few minutes to read; it’s not ‘techie’ (anybody would understand it!), and it is relevant. The kind of things that are expressed in the story – while futuristic (and facist) sounding now, are being put into effect… slowly, quietly… by companies such as Sony, Phillips, Apple, and Microsoft: not to mention the manufactors of CDs and DVDs.

It’s been circulating the ‘net for years, but recent events such as InterTrust’s Universal Digital Rights Management System (report: The Register), which they claim will be ready within 6 months, and Microsoft’s ongoing work on the ‘Palladium’ project (report: BBC News) – topical events which mark the beginning of what could be the most important thing ever to happen in the history of copyright law, computing, and freedom of information.

So, go on – go read… [the remainder of this post, and three comments, have been lost]

Smart Alex

Alex, my incompetent co-worker, came up with the following gem in today’s meeting when talking about a product that would aid employers in securely tracking how long their employees actually spend working:

“It’s not going to have any of that… security… nonsense.”

I shall have to beat him to death later.

P.S. told you that this thing was going to get big, quick. The Register reports “All your Web typos are belong to us”, and I quote: “Already a backlash is building, with Net admins being urged to block Verisign’s catch-all domain. This could get very messy.”