Today we reinstated youtube-dl, a popular project on GitHub, after we received additional information about the project that enabled us to reverse a Digital Millennium Copyright Act (DMCA) takedown.
This is a Big Deal. For two reasons:
Firstly, youtube-dl is a spectacularly useful project. I’ve used it for many years to help me archive my own content, to improve my access to content that’s freely available on the platform, and to help centralise (freely available) metadata to keep my subscriptions on video-sharing sites. Others have even more-important uses for the tool. I love youtube-dl, and I’d never considered the possibility that it could be used to circumvent digital restrictions (apparently it’s got some kind of geofence-evading features you can optionally enable, for people who don’t have a multi-endpoint VPN I guess?… I note that it definitely doesn’t break DRM…) until its GitHub repo got taken down the other week.
Which was a bleeding stupid thing to use a DMCA request on, because, y’know: Barbara Streisand Effect. Lampshading that a free, open-source tool could be used for people’s convenience is likely to increase awareness and adoption, not decrease it! Huge thanks to the EFF for stepping up and telling GitHub that they’d got it wrong (this letter is great reading, by the way).
But secondly, GitHub’s response is admirable and – assuming their honour their new stance – effective. They acknowledge their mistake, then go on to set out a new process by which they’ll review takedown requests. That new process includes technical and legal review, erring on the side of the developer rather than the claimant (i.e. “innocent until proven guilty”), multiparty negotiation, and limiting the scope of takedowns by allowing violators to export their non-infringing content after the fact.
I was concerned that the youtube-dl takedown might create a FOSS “chilling effect” on GitHub. It still might: in the light of it, I for one have started backing up my repositories and those of projects I care about to an different Git server! But with this response, I’d still be confident hosting the main copy of an open-source project on GitHub, even if that project was one which was at risk of being mistaken for copyright violation.
Note that the original claim came not from Google/YouTube as you might have expected (if you’ve just tuned in) but from the RIAA, based on the fact that youtube-dlcould be used to download copyrighted music videos for enjoyment offline. If you’re reminded of Sony v. Universal City Studios (1984) – the case behind the “Betamax standard” – you’re not alone.
When I first started working at the Bodleian Libraries in 2011, their websites were looking… a little dated. I’d soon spend some time working with a vendor (whose premises mysteriously caught fire while I was there, freeing me up to spend my birthday in a bar) to develop a fresh, modern interface for our websites that, while not the be-all and end-all, was a huge leap forwards and has served us well for the last five years or so.
Fast-forward a little: in about 2015 we noticed a few strange anomalies in our Google Analytics data. For some reason, web addresses were appearing that didn’t exist anywhere on our site! Most of these resulted from web visitors in Turkey, so we figured that some Turkish website had probably accidentally put our Google Analytics user ID number into their code rather than their own. We filtered out the erroneous data – there wasn’t much of it; the other website was clearly significantly less-popular than ours – and carried on. Sometimes we’d speculate about the identity of the other site, but mostly we didn’t even think about it.
Earlier this year, there was a spike in the volume of the traffic we were having to filter-out, so I took the time to investigate more-thoroughly. I determined that the offending website belonged to the Library of Bilkent University, Turkey. I figured that some junior web developer there must have copy-pasted the Bodleian’s Google Analytics code and forgotten to change the user ID, so I went to the website to take a look… but I was in for an even bigger surprise.
Whoah! The web design of a British university was completely ripped-off by a Turkish university! Mouth agape at the audacity, I clicked my way through several of their pages to try to understand what had happened. It seemed inconceivable that it could be a coincidence, but perhaps it was supposed to be more of an homage than a copy-paste job? Or perhaps they were ripped-off by an unscrupulous web designer? Or maybe it was somebody on the “inside”, like our vendor, acting unethically by re-selling the same custom design? I didn’t believe it could be any of those things, but I had to be sure. So I started digging…
I was almost flattered as I played this spot-the-difference competition, until I saw the copyright notice: stealing our design was galling enough, but then relicensing it in such a way that they specifically encourage others to steal it too was another step entirely. Remember that we’re talking about an academic library, here: if anybody ought to have a handle on copyright law then it’s a library!
I took a dive into the source code to see if this really was, as it appeared to be, a copy-paste-and-change-the-name job (rather than “merely” a rip-off of the entire graphic design), and, sure enough…
It looks like they’d just mirrored the site and done a search-and-replace for “Bodleian”, replacing it with “Bilkent”. Even the code’s spelling errors, comments, and indentation were intact. The CSS was especially telling (as well as being chock-full of redundant code relating to things that appear on our website but not on theirs)…
So I reached out to them with a tweet:
I didn’t get any response, although I did attract a handful of Turkish followers on Twitter. Later, they changed their Twitter handle and I thought I’d take advantage of the then-new capability for longer tweets to have another go at getting their attention:
Clearly this was what it took to make the difference. I received an email from the personal email account of somebody claiming to be Taner Korkmaz, Systems Librarian with Bilkent’s Technical Services team. He wrote (emphasis mine):
Dear Mr. Dan Q,
My name is Taner Korkmaz and I am the systems librarian at Bilkent. I am writing on behalf of Bilkent University Library, regarding your share about Bilkent on your Twitter account.
Firstly, I would like to explain that there is no any relation between your tweet and our library Twitter handle change. The librarian who is Twitter admin at Bilkent did not notice your first tweet. Another librarian took this job and decided to change the twitter handle because of the Turkish letters, abbreviations, English name requirement etc. The first name was @KutphaneBilkent (kutuphane means library in Turkish) which is not clear and not easy to understand. Now, it is @LibraryBilkent.
About 4 years ago, we decided to change our library website, (and therefore) we reviewed the appearance and utility of the web pages.
We appreciated the simplicity and clarity of the user interface of University of Oxford Bodlien Library & Radcliffe Camera, as an academic pioneer in many fields. As a not profit institution, we took advantage of your template by using CSS and HTML, and added our own original content.
We thought it would not create a problem the idea of using CSS codes since on the web page there isn’t any license notice or any restriction related to the content of the template, and since the licenses on the web pages are mainly more about content rather than templates.
The Library has its own Google Analytics and Search Console accounts and the related integrations for the web site statistical data tracking. We would like to point out that there is a misunderstanding regarding this issue.
In 2017, we started to work on creating a new web page and we will renew our current web page very soon.
Thank you in advance for your attention to this matter and apologies for possible inconveniences.
Or to put it another way: they decided that our copyright notice only applied to our content and not our design and took a copy of the latter.
Do you remember when I pointed out earlier that librarians should be expected to know their way around copyright law? Sigh.
They’ve now started removing evidence of their copy-pasting such as the duplicate Google Analytics code fragment and the references to LibraryData, but you can still find the unmodified code via archive.org, if you like.
That probably ends my part in this little adventure, but I’ve passed everything on to the University of Oxford’s legal team in case any of them have anything to say about it. And now I’ve got a new story to tell where web developers get together over a pint: the story of the time that I made a website for a university… and a different university stole it!