When One Library Steals From Another

When I first started working at the Bodleian Libraries in 2011, their websites were looking… a little dated. I’d soon spend some time working with a vendor (whose premises mysteriously caught fire while I was there, freeing me up to spend my birthday in a bar) to develop a fresh, modern interface for our websites that, while not the be-all and end-all, was a huge leap forwards and has served us well for the last five years or so.

The Bodleian Libraries website as it appeared in 2011.
The colour scheme, the layout, the fact that it didn’t remotely work on mobiles… there was a lot wrong with the old design of the Bodleian Libraries’ websites.

Fast-forward a little: in about 2015 we noticed a few strange anomalies in our Google Analytics data. For some reason, web addresses were appearing that didn’t exist anywhere on our site! Most of these resulted from web visitors in Turkey, so we figured that some Turkish website had probably accidentally put our Google Analytics user ID number into their code rather than their own. We filtered out the erroneous data – there wasn’t much of it; the other website was clearly significantly less-popular than ours – and carried on. Sometimes we’d speculate about the identity of the other site, but mostly we didn’t even think about it.

Bodleian Library & Radcliffe Camera website
How a Bodleian Libraries’ website might appear today. Pay attention, now: there’ll be a spot-the-difference competition in a moment.

Earlier this year, there was a spike in the volume of the traffic we were having to filter-out, so I took the time to investigate more-thoroughly. I determined that the offending website belonged to the Library of Bilkent University, Turkey. I figured that some junior web developer there must have copy-pasted the Bodleian’s Google Analytics code and forgotten to change the user ID, so I went to the website to take a look… but I was in for an even bigger surprise.

Bilkent University Library website, as it appears today.
Hey, that looks… basically identical!

Whoah! The web design of a British university was completely ripped-off by a Turkish university! Mouth agape at the audacity, I clicked my way through several of their pages to try to understand what had happened. It seemed inconceivable that it could be a coincidence, but perhaps it was supposed to be more of an homage than a copy-paste job? Or perhaps they were ripped-off by an unscrupulous web designer? Or maybe it was somebody on the “inside”, like our vendor, acting unethically by re-selling the same custom design? I didn’t believe it could be any of those things, but I had to be sure. So I started digging…

Bodleian and Bilkent search boxes, side-by-side.
Our user research did indicate that putting the site and catalogue search tools like this was smart. Maybe they did the same research?

 

Bodleian and Bilkent menus side-by-side.
Menus are pretty common on many websites. They probably just had a similar idea.

 

Bodleian and Bilkent opening hours, side-by-side.
Tabs are a great way to show opening hours. Everybody knows that. And this is obviously just the a popular font.

 

Bodleian and Bilkent sliders, side-by-side.
Oh, you’ve got a slider too. With circles? And you’ve got an identical Javascript bug? Okay… now that’s a bit of a coincidence…

 

Bodleian and Bilkent content boxes, side-by-side.
Okay, I’m getting a mite suspicious now. Surely we didn’t independently come up with this particular bit of design?

 

Bodleian and Bilkent footers, side-by-side.
Well these are clearly different. Ours has a copyright notice, for example…

 

Copyright notice on Bilkent University Library's website.
Oh, you DO have a copyright notice. Hang on, wait: you’ve not only stolen our design but you’ve declared it to be open-source???

I was almost flattered as I played this spot-the-difference competition, until I saw the copyright notice: stealing our design was galling enough, but then relicensing it in such a way that they specifically encourage others to steal it too was another step entirely. Remember that we’re talking about an academic library, here: if anybody ought to have a handle on copyright law then it’s a library!

I took a dive into the source code to see if this really was, as it appeared to be, a copy-paste-and-change-the-name job (rather than “merely” a rip-off of the entire graphic design), and, sure enough…

HTML source code from Bilkent University Library.
In their HTML source code, you can see both the Bodleian’s Google Analytics code (which they failed to remove) but also their own. And a data- attribute related to a project I wrote and that means nothing to their site.

It looks like they’d just mirrored the site and done a search-and-replace for “Bodleian”, replacing it with “Bilkent”. Even the code’s spelling errors, comments, and indentation were intact. The CSS was especially telling (as well as being chock-full of redundant code relating to things that appear on our website but not on theirs)…

CSS code from Bilkent University.
The search-replace resulted in some icky grammar, like “the Bilkent” appearing in their code. And what’s this? That’s MY NAME in the middle of their source code!

So I reached out to them with a tweet:

Tweet: Hey @KutphaneBilkent (Bilkent University Library): couldn't help but notice your website looks suspiciously like those of @bodleianlibs...?
My first tweet to Bilkent University Library contained a “spot the difference” competition.

I didn’t get any response, although I did attract a handful of Turkish followers on Twitter. Later, they changed their Twitter handle and I thought I’d take advantage of the then-new capability for longer tweets to have another go at getting their attention:

Tweet: I see you've changed your Twitter handle, @librarybilkent! Your site still looks like you've #stolen the #webdesign from @bodleianlibs, though (and changed the license to a #CreativeCommons one, although the fact you forgot to change the #GoogleAnalytics ID is a giveaway...).
This time, I was a little less-sarcastic and a little more-aggressive. Turns out that’s all that was needed.

Clearly this was what it took to make the difference. I received an email from the personal email account of somebody claiming to be Taner Korkmaz, Systems Librarian with Bilkent’s Technical Services team. He wrote (emphasis mine):

Dear Mr. Dan Q,

My name is Taner Korkmaz and I am the systems librarian at Bilkent. I am writing on behalf of Bilkent University Library, regarding your share about Bilkent on your Twitter account.

Firstly, I would like to explain that there is no any relation between your tweet and our library Twitter handle change. The librarian who is Twitter admin at Bilkent did not notice your first tweet. Another librarian took this job and decided to change the twitter handle because of the Turkish letters, abbreviations, English name requirement etc. The first name was @KutphaneBilkent (kutuphane means library in Turkish) which is not clear and not easy to understand. Now, it is @LibraryBilkent.

About 4 years ago, we decided to change our library website, (and therefore) we reviewed the appearance and utility of the web pages.

We appreciated the simplicity and clarity of the user interface of University of Oxford Bodlien Library & Radcliffe Camera, as an academic pioneer in many fields. As a not profit institution, we took advantage of your template by using CSS and HTML, and added our own original content.

We thought it would not create a problem the idea of using CSS codes since on the web page there isn’t any license notice or any restriction related to the content of the template, and since the licenses on the web pages are mainly more about content rather than templates.

The Library has its own Google Analytics and Search Console accounts and the related integrations for the web site statistical data tracking. We would like to point out that there is a misunderstanding regarding this issue.

In 2017, we started to work on creating a new web page and we will renew our current web page very soon.

Thank you in advance for your attention to this matter and apologies for possible inconveniences.

Yours sincerely,

Or to put it another way: they decided that our copyright notice only applied to our content and not our design and took a copy of the latter.

Do you remember when I pointed out earlier that librarians should be expected to know their way around copyright law? Sigh.

They’ve now started removing evidence of their copy-pasting such as the duplicate Google Analytics code fragment and the references to LibraryData, but you can still find the unmodified code via archive.org, if you like.

That probably ends my part in this little adventure, but I’ve passed everything on to the University of Oxford’s legal team in case any of them have anything to say about it. And now I’ve got a new story to tell where web developers get together over a pint: the story of the time that I made a website for a university… and a different university stole it!

EEBO-TCP Hackathon

Last month I got the opportunity to attend the EEBO-TCP Hackfest, hosted in the (then still very-much under construction) Weston Library at my workplace. I’ve done a couple of hackathons and similar get-togethers before, but this one was somewhat different in that it was unmistakably geared towards a different kind of geek than the technology-minded folks that I usually see at these things. People like me, with a computer science background, were remarkably in the minority.

Dan Q in Blackwell Hall, at the Weston Library.
Me in the Weston Library (still under construction, as evidenced by the scaffolding in the background).

Instead, this particular hack event attracted a great number of folks from the humanities end of the spectrum. Which is understandable, given its theme: the Early English Books Online Text Creation Partnership (EEBO-TCP) is an effort to digitise and make available in marked-up, machine-readable text formats a huge corpus of English-language books printed between 1475 and 1700. So: a little over three centuries of work including both household names (like Shakespeare, Galileo, Chaucer, Newton, Locke, and Hobbes) and an enormous number of others that you’ll never have heard of.

Dan Q talks to academic Stephen Gregg
After an introduction to the concept and the material, attendees engaged in a speed-networking event to share their thoughts prior to pitching their ideas.

The hackday event was scheduled to coincide with and celebrate the release of the first 25,000 texts into the public domain, and attendees were challenged to come up with ways to use the newly-available data in any way they liked. As is common with any kind of hackathon, many of the attendees had come with their own ideas half-baked already, but as for me: I had no idea what I’d end up doing! I’m not particularly familiar with the books of the 15th through 17th centuries and I’d never looked at the way in which the digitised texts had been encoded. In short: I knew nothing.

Dan Q and Liz McCarthy listen as other attendees pitch their ideas.
The ideas pitch session quickly showed some overlap between different project ideas, and teams were split and reformed a few times as people found the best places for themselves.

Instead, I’d thought: there’ll be people here who need a geek. A major part of a lot of the freelance work I end up doing (and a lesser part of my work at the Bodleian, from time to time) involves manipulating and mining data from disparate sources, and it seemed to me that these kinds of skills would be useful for a variety of different conceivable projects.

Dan Q explains what the spreadsheet he's produced 'means'.
XML may have been our interchange format, but everything fell into Excel in the end for speedy management even by less-technical team members.

I paired up with a chap called Stephen Gregg, a lecturer in 18th century literature from Bath Spa University. His idea was to use this newly-open data to explore the frequency (and the change in frequency over the centuries) of particular structural features in early printed fiction: features like chapters, illustrations, dedications, notes to the reader, encomia, and so on). This proved to be a perfect task for us to pair-up on, because he had the domain knowledge to ask meaningful questions, and I had the the technical knowledge to write software that could extract the answers from the data. We shared our table with another pair, who had technically-similar goals – looking at the change in the use of features like lists and tables (spoiler: lists were going out of fashion, tables were coming in, during the 17th century) in alchemical textbooks – and ultimately I was able to pass on the software tools I’d written to them to adapt for their purposes, too.

Dan Q with two academic-minded humanities folks, talking about their hackathon projects.
A quick meeting on the relative importance of ‘chapters’ as a concept in 16th century literature. Half of the words that the academics are saying go over my head, but I’m formulating XPath queries in my head while I wait.

And here’s where I made a discovery: the folks I was working with (and presumably academics of the humanities in general) have no idea quite how powerful data mining tools could be in giving them new opportunities for research and analysis. Within two hours we were getting real results from our queries and were making amendments and refinements in our questions and trying again. Within a further two hours we’d exhausted our original questions and, while the others were writing-up their findings in an attractive way, I was beginning to look at how the structural differences between fiction and non-fiction might be usable as a training data set for an artificial intelligence that could learn to differentiate between the two, providing yet more value from the dataset. And all the while, my teammates – who’d been used to looking at a single book at a time – were amazed by the possibilities we’d uncovered for training computers to do simple tasks while reading thousands at once.

Laptop showing a map of the area around Old St. Paul's Cathedral.
The area around Old St. Paul’s Cathedral was the place to be if you were a 16th century hipster looking for a new book.

Elsewhere at the hackathon, one group was trying to simulate the view of the shelves of booksellers around the old St. Paul’s Cathedral, another looked at the change in the popularity of colour and fashion-related words over the period (especially challenging towards the beginning of the timeline, where spelling of colours was less-standardised than towards the end), and a third came up with ways to make old playscripts accessible to modern performers.

A graph showing the frequency of colour-related words in English-language books printed over three centuries.
Aside from an increase in the relative frequency of the use of colour words to describe yellow things, there’s not much to say about this graph.

At the end of the session we presented our findings – by which I mean, Stephen explained what they meant – and talked about the technology and its potential future impact – by which I mean, I said what we’d like to allow others to do with it, if they’re so-inclined. And I explained how I’d come to learn over the course of the day what the word encomium meant.

Dan Q presents findings in Excel.
Presenting our findings in amazing technicolour Excel.

My personal favourite contribution from the event was by Sarah Cole, who adapted the text of a story about a witch trial into a piece of interactive fiction, powered by Twine/Twee, and then allowed us as an audience to collectively “play” her game. I love the idea of making old artefacts more-accessible to modern audiences through new media, and this was a fun and innovative way to achieve this. You can even play her game online!

(by the way: for those of you who enjoy my IF recommendations: have a look at Detritus; it’s a delightful little experimental/experiential game)

Output from the interactive fiction version of a story about a witch trial.
Things are about to go very badly for Joan Buts.

But while that was clearly my favourite, the judges were far more impressed by the work of my teammate and I, as well as the team who’d adapted my software and used it to investigate different features of the corpus, and decided to divide the cash price between the four of us. Which was especially awesome, because I hadn’t even realised that there was a prize to be had, and I made the most of it at the Drinking About Museums event I attended later in the day.

Members of the other team, who adapted my software, were particularly excited to receive their award.
Cold hard cash! This’ll be useful at the bar, later!

If there’s a moral to take from all of this, it’s that you shouldn’t let your background limit your involvement in “hackathon”-like events. This event was geared towards literature, history, linguistics, and the study of the book… but clearly there was value in me – a computer geek, first and foremost – being there. Similarly, a hack event I attended last year, while clearly tech-focussed, wouldn’t have been as good as it was were it not for the diversity of the attendees, who included a good number of artists and entrepreneurs as well as the obligatory hackers.

Stephen and Dan give a congratulatory nod to one another.
“Nice work, Stephen.” “Nice work, Dan.”

But for me, I think the greatest lesson is that humanities researchers can benefit from thinking a little bit like computer scientists, once in a while. The code I wrote (which uses Ruby and Nokogiri) is freely available for use and adaptation, and while I’ve no idea whether or not it’ll ever be useful to anybody again, what it represents is the research benefits of inter-disciplinary collaboration. It pleases me to see things like the “Library Carpentry” (software for research, with a library slant) seeming to take off.

And yeah, I love a good hackathon.

Update 2015-04-22 11:59: with thanks to Sarah for pointing me in the right direction, you can play the witch trial game in your browser.

Squiz CMS Easter Eggs (or: why do I keep seeing Greg’s name in my CAPTCHA?)

Anybody who has, like me, come into contact with the Squiz Matrix CMS for any length of time will have come across the reasonably easy-to-read but remarkably long CAPTCHA that it shows. These are especially-noticeable in its administrative interface, where it uses them as an exaggerated and somewhat painful “are you sure?” – restarting the CMS’s internal crontab manager, for example, requires that the administrator types a massive 25-letter CAPTCHA.

Four long CAPTCHA from the Squiz Matrix CMS.
Four long CAPTCHA from the Squiz Matrix CMS.

But there’s another interesting phenomenon that one begins to notice after seeing enough of the back-end CAPTCHA that appear. Strange patterns of letters that appear in sequence more-often than would be expected by chance. If you’re a fan of wordsearches, take a look at the composite screenshot above: can you find a person’s name in each of the four lines?

Four long CAPTCHA from the Squiz Matrix CMS, with the names Greg, Dom, Blair and Marc highlighted.
Four long CAPTCHA from the Squiz Matrix CMS, with the names Greg, Dom, Blair and Marc highlighted.

There are four names – GregDomBlair and Marc – which routinely appear in these CAPTCHA. Blair, being the longest name, was the first that I noticed, and at first I thought that it might represent a fault in the pseudorandom number generation being used that was resulting in a higher-than-normal frequency of this combination of letters. Another idea I toyed with was that the CAPTCHA text might be being entirely generated from a set of pronounceable syllables (which is a reasonable way to generate one-time passwords that resist entry errors resulting from reading difficulties: in fact, we do this at Three Rings), in which these four names also appear, but by now I’d have thought that I’d have noticed this in other patterns, and I hadn’t.

Instead, then, I had to conclude that these names were some variety of Easter Egg.

In software (and other media), "Easter Eggs" are undocumented  hidden features, often in the form of inside jokes.
Smiley decorated eggs. Picture courtesy Kate Ter Haar.

I was curious about where they were coming from, so I searched the source code, but while I found plenty of references to Greg Sherwood, Marc McIntyre, and Blair Robertson. I couldn’t find Dom, but I’ve since come to discover that he must be Dominic Wong – these four were, according to Greg’s blog – developers with Squiz in the early 2000s, and seemingly saw themselves as a dynamic foursome responsible for the majority of the CMS’s code (which, if the comment headers are to be believed, remains true).

Greg, Marc, Blair and Dom, as depicted in Greg's 2007 blog post.
Greg, Marc, Blair and Dom, as depicted in Greg’s 2007 blog post.

That still didn’t answer for me why searching for their names in the source didn’t find the responsible code. I started digging through the CMS’s source code, where I eventually found fudge/general/general.inc (a lot of Squiz CMS code is buried in a folder called “fudge”, and web addresses used internally sometimes contain this word, too: I’d like to believe that it’s being used as a noun and that the developers were just fans of the buttery sweet, but I have a horrible feeling that it was used in its popular verb form). In that file, I found this function definition:

/**
 * Generates a string to be used for a security key
 *
 * @param int            $key_len                the length of the random string to display in the image
 * @param boolean        $include_uppercase      include uppercase characters in the generated password
 * @param boolean        $include_numbers        include numbers in the generated password
 *
 * @return string
 * @access public
 */
function generate_security_key($key_len, $include_uppercase = FALSE, $include_numbers = FALSE) {
  $k = random_password($key_len, $include_uppercase, $include_numbers);
  if ($key_len > 10) {
    $gl = Array('YmxhaXI=', 'Z3JlZw==', 'bWFyYw==', 'ZG9t');
    $g = base64_decode($gl[rand(0, (count($gl) - 1)) ]);
    $pos = rand(1, ($key_len - strlen($g)));
    $k = substr($k, 0, $pos) . $g . substr($k, ($pos + strlen($g)));
  }
  return $k;
} //end generate_security_key()

For the benefit of those of you who don’t speak PHP, especially PHP that’s been made deliberately hard to decipher, here’s what’s happening when “generate_security_key” is being called:

  • A random password is being generated.
  • If that password is longer than 10 characters, a random part of it is being replaced with either “blair”, “greg”, “marc”, or “dom”. The reason that you can’t see these words in the code is that they’re trivially-encoded using a scheme called Base64 – YmxhaXI=Z3JlZw==, bWFyYw==, and ZG9t are Base64 representations of the four names.

This seems like a strange choice of Easter Egg: immortalising the names of your developers in CAPTCHA. It seems like a strange choice especially because this somewhat weakens the (already-weak) CAPTCHA, because an attacking robot can quickly be configured to know that a 11+-letter codeword will always consist of letters and exactly one instance of one of these four names: in fact, knowing that a CAPTCHA will always contain one of these four and that I can refresh until I get one that I like, I can quickly turn an 11-letter CAPTCHA into a 6-letter one by simply refreshing until I get one with the longest name – Blair – in it!

A lot has been written about how Easter Eggs undermine software security (in exchange for a small boost to developer morale) – that’s a major part of why Microsoft has banned them from its operating systems (and, for the most part, Apple has too). Given that these particular CAPTCHA in Squiz CMS are often nothing more than awkward-looking “are you sure?” dialogs, I’m not concerned about the direct security implications, but it does make me worry a little about the developer culture that produced them.

I know that this Easter Egg might be harmless, but there’s no way for me to know (short of auditing the entire system) what other Easter Eggs might be hiding under the surface and what they do, especially if the developers have, as in this case, worked to cover their tracks! It’s certainly the kind of thing I’d worry about if I were, I don’t know, a major government who use Squiz software, especially their cloud-hosted variants which are harder to effectively audit. Just a thought.

Rave Reviews for Your Password Sucks

Last month, I volunteered myself to run a breakout session at the 2012 UAS Conference, an annual gathering of up to a thousand Oxford University staff. I’d run a 2-minute micropresentation at the July 2011 OxLibTeachMeet called “Your Password Sucks!”, and I thought I’d probably be able to expand that into a larger 25-minute breakout session.

Your password: How bad guys will steal your identity
My expanded presentation was called “Your password: How bad guys will steal your identity”, because I wasn’t sure that I’d get away with the title “Your Password Sucks” at a larger, more-formal event.

The essence of my presentation boiled down to demonstrating four points. The first was you are a target – dispelling the myth that the everyday person can consider themselves safe from the actions of malicious hackers. I described the growth of targeted phishing attacks, and relayed the sad story of Mat Honan’s victimisation by hackers.

The second point was that your password is weak: I described the characteristics of good passwords (e.g. sufficiently long, complex, random, and unique) and pointed out that even among folks who’d gotten a handle on most of these factors, uniqueness was still the one that tripped people over. A quarter of people use only a single password for most or all of their accounts, and over 50% use 5 or fewer passwords across dozens of accounts.

You are a target. Your password is weak. Attacks are on the rise. You can protect yourself.
The four points I wanted to make through my presentation. Starting by scaring everybody ensured that I had their attention right through ’til I told them what they could do about it, at the end.

Next up: attacks are on the rise. By a combination of statistics, anecdotes, audience participation and a theoretical demonstration of how a hacker might exploit shared-password vulnerabilities to gradually take over somebody’s identity (and then use it as a platform to attack others), I aimed to show that this is not just a hypothetical scenario. These attacks really happen, and people lose their money, reputation, or job over them.

Finally, the happy ending to the story: you can protect yourself. Having focussed on just one aspect of password security (uniqueness), and filling a 25-minute slot with it, I wanted to give people some real practical suggestions for the issue of password uniqueness. These came in the form of free suggestions that they could implement today. I suggested “cloud” options (like LastPass or 1Password), hashing options (like SuperGenPass), and “offline” technical options (like KeePass or a spreadsheet bundles into a TrueCrypt volume).

I even suggested a non-technical option involving a “master” password that is accompanied by one of several unique prefixes. The prefixes live on a Post-It Note in your wallet. Want a backup? Take a picture of them with your mobile: they’re worthless without the master password, which lives in your head. It’s not as good as a hash-based solution, because a crafty hacker who breaks into several systems might be able to determine your master password, but it’s “good enough” for most people and a huge improvement on using just 5 passwords everywhere! (another great “offline” mechanism is Steve Gibson’s Off The Grid system)

"Delivery" ratings for the UAS Conference "breakout" sessions
My presentation – marked on the above chart – left people “Very Satisfied” significantly more than any other of the 50 breakout sessions.

And it got fantastic reviews! That pleased me a lot. The room was packed, and eventually more chairs had to be brought in for the 70+ folks who decided that my session was “the place to be”. The resulting feedback forms made me happy, too: on both Delivery and Content, I got more “Very Satisfied” responses than any other of the 50 breakout sessions, as well as specific comments. My favourite was:

Best session I have attended in all UAS conferences. Dan Q gave a 5 star performance.

So yeah; hopefully they’ll have me back next year.

Quiet On Set

Before I started working for the Bodleian, I’d never worked somewhere where there was a significant risk of a film crew coming between me and my office. But since then, it seems to happen with a startling regularity.

This morning, I was almost late for work as I fought my way past a film crew shooting The Quiet Ones, some variety of supernatural thriller B-movie.

This guy. That bridge. Listen.
This guy. That bridge. Listen.

So, when you end up watching it: wait until you get to the scene where this guy walks under the Hertford Bridge, and listen carefully for the sound of somebody walking across gravel just off camera. That’s me, putting my bike away having finally squeezed my way past all of the cameras and equipment on the way to my office.

A Surprise Christmas Gift

A strange package appeared outside of the door to my office, some time this morning, wrapped as a gift and accompanied by a card.

A card, bottle of wine, and box of chocolates!

It turns out to have been my colleagues at the Bodleian Shop, whose newly-relaunched e-commerce site I was drafted into at the last minute to iron out a few technical hitches in time for them to start making online sales before the Christmas rush. There were a few somewhat-stressful moments as technical folk from disparate providers worked together to link-up all of the parts of the site (warehouse and stock level systems, order and payment processing, content management, and of course the web front end), but it all came together in the end… and I think a lot of lessons were learned from the experience.

My bottle of wine, chilling amidst the anti-bird-wire on the window ledge of the building.

So that was a very sweet surprise. I knew that they’d appreciated my “hopping department” in order to firefight the various problems that came up during their deployment, but it was still really awesome to get an alcoholic, chocolatey thank-you and a cute card signed by their team, to boot.

QR Codes of the Bodleian

The Treasures of the Bodleian exhibition opened today, showcasing some of the Bodleian Libraries‘ most awe-inspiring artefacts: fragments of original lyrics by Sappho, charred papyrus from Herculaneum prior to the eruption of Mt. Vesuvius in 79 CE, and Conversation with Smaug, a watercolour by J. R. R. Tolkien to illustrate The Hobbit are three of my favourites. Over the last few weeks, I’ve been helping out with the launch of this exhibition and its website.

From an elevated position in the exhibition room, I run a few tests of the technical infrastructure whilst other staff set up, below.

In particular, something I’ve been working on are the QR codes. This experiment – very progressive for a sometimes old-fashioned establishment like the Bodleian – involves small two-dimensional barcodes being placed with the exhibits. The barcodes are embedded with web addresses for each exhibit’s page on the exhibition website. Visitors who scan them – using a tablet computer, smartphone, or whatever – are directed to a web page where they can learn more about the item in front of them and can there discuss it with other visitors or can “vote” on it: another exciting new feature in this exhibition is that we’re trying quite hard to engage academics and the public in debate about the nature of “treasures”: what is a treasure?

A QR code in place at the Treasures of the Bodleian exhibition.

In order to improve the perceived “connection” between the QR code and the objects, to try to encourage visitors to scan the codes despite perhaps having little or no instruction, we opted to embed images in the QR codes relating to the objects they related to. By cranking up the error-correction level of a QR code, it’s possible to “damage” them quite significantly and still have them scan perfectly well.

One of my "damaged" QR codes. This one corresponds to The Laxton Map, a 17th Century map of common farming land near Newark on Trent.

We hope that the visual association between each artefact and its QR code will help to make it clear that the code is related to the item (and isn’t, for example, some kind of asset tag for the display case or something). We’re going to be monitoring usage of the codes, so hopefully we’ll get some meaningful results that could be valuable for future exhibitions: or for other libraries and museums.

Rolling Your Own

If you’re interested in making your own QR codes with artistic embellishment (and I’m sure a graphic designer could do a far better job than I did!), here’s my approach:

  1. I used Google Infographics (part of Chart Tools) to produce my QR codes. It’s fast, free, simple, and – crucially – allows control over the level of error correction used in the resulting code. Here’s a sample URL to generate the QR code above:

https://chart.googleapis.com/chart?chs=500×500&cht=qr&chld=H|0&chl=HTTP://TREASURES.BODLEIAN.OX.AC.UK/T7

  1. 500×500 is the size of the QR code. I was ultimately producing 5cm codes because our experiments showed that this was about the right size for our exhibition cabinets, the distance from which people would be scanning them, etc. For laziness, then, I produced codes 500 pixels square at a resolution of 100 pixels per centimetre.
  2. H specifies that we want to have an error-correction level of 30%, the maximum possible. In theory, at least, this allows us to do the maximum amount of “damage” to our QR code, by manipulating it, and still have it work; you could try lower levels if you wanted, and possibly get less-complex-looking codes.
  3. 0 is the width of the border around the QR code. I didn’t want a border (as I was going to manipulate the code in Photoshop anyway), so I use a width of 0.
  4. The URL – HTTP://TREASURES.BODLEIAN.OX.AC.UK/T7  – is presented entirely in capitals. This is because capital letters use fewer bits when encoded as QR codes. “http” and domain names are case-insensitive anyway, and we selected our QR code path names to be in capitals. We also shortened the URL as far as possible: owing to some complicated technical and political limitations, we weren’t able to lean on URL-shortening services like bit.ly, so we had to roll our own. In hindsight, it’d have been nice to have set up the subdomain “t.bodleian.ox.ac.uk”, but this wasn’t possible within the time available. Remember: the shorter the web address, the simpler the code, and simpler codes are easier and faster to read.
  5. Our short URLs redirect to the actual web pages of each exhibit, along with an identifying token that gets picked up by Google Analytics to track how widely the QR codes are being used (and which ones are most-popular amongst visitors).
By now, you'll have a QR code that looks a little like this.
  1. Load that code up in Photoshop, along with the image you’d like to superimpose into it. Many of the images I’ve had to work with are disturbingly “square”, so I’ve simply taken them, given them a white or black border (depending on whether they’re dark or light-coloured). With others, though, I’ve been able to cut around some of the more-attractive parts of the image in order to produce something with a nicer shape to it. In any case, put your image in as a layer on top of your QR code.
  2. Move the image around until you have something that’s aesthetically-appealing. With most of my square images, I’ve just plonked them in the middle and resized them to cover a whole number of “squares” of the QR code. With the unusually-shaped ones, I’ve positioned them such that they fit in with the pattern of the QR code, somewhat, then I’ve inserted another layer in-between the two and used it to “white out” the QR codes squares that intersect with my image, giving a jagged, “cut out” feel.
  3. Test! Scan the QR code from your screen, and again later from paper, to make sure that it’s intact and functional. If it’s not, adjust your overlay so that it covers less of the QR code. Test in a variety of devices. In theory, it should be possible to calculate how much damage you can cause to a QR code before it stops working (and where it’s safe to cause the damage), but in practice it’s faster to use trial-and-error. After a while, you get a knack for it, and you almost feel as though you can see where you need to put the images so that they just-barely don’t break the codes. Good luck!
Another of my "damaged" QR codes. I'm reasonably pleased with this one.

Give it a go! Make some QR codes that represent your content (web addresses, text, vCards, or whatever) and embed your own images into them to make them stand out with a style of their own.

Time

My name is Dan, and I am a chronogoldfish.

Is this a chronogoldfish? I don't know. And neither do you. I just made them up.

You see: the thing that goldfish are famous for – except for their allegedly very short memory, which is actually a myth – is that they grow to fill the available space. That is: if you keep a goldfish in a smaller tank, it’ll grow to a full-size that is smaller than if you kept it in a larger tank or even a pond. I’m not certain that’s actually true either, and I’m sure that Kit will correct me pretty soon if it’s not, but it’s part of my analogy and I’m sticking with it.

A chronogoldfish, then, is somebody who grows to fill the available time. That is: the more free time you give them, the more they’ll work at filling it up. This is a mixed blessing, which is a euphemism for “usually pretty bad.” You’ll almost never catch me bored, for example – I’ve no idea how I’d find time to be bored! – but conversely it’s reasonably rare to find me with free time in which I don’t have something scheduled (or, at least: in which I don’t have something I ought to be doing).

Earlier this year, I started working for the Bodleian, and this – along with a couple of other changes going on in my life, suddenly thrust upon me several hours extra in each week than I’d had previously. It was like being transplanted from a tank… into a pond and – once I’d stopped checking for herons – I found myself sitting around, wondering what to do with my sudden surge of extra free time. But then, because I’m a chronogoldfish, I grew.

The activities that I already did became bigger – I took on more responsibilities in my voluntary work, took more opportunities to socialise with people I spend time with, and expanded my efforts to develop a variety of “side project” software  projects. I’ve even lined myself up for a return to (part-time) education, later this year (more on that in another blog post, little doubt). And so, only a few months later, I’m a big, fat chronogoldfish, and I’ve once again got just about as little “free” time – unplanned time – as I had before.

But that’s not a bad thing. As Seth Godin says, wasting time (properly) is a good thing. And there’s little doubt that my growth into “new” timesinks is productive (education, voluntary work), experimental (side-projects, education), and joyful (socialising, everything else). I’d like to think I use time well, even if I do sometimes wonder: where did it all go?

I suppose the opposite of a chronogoldfish might be a chronomidget: somebody who doesn’t grow to consume any more time than they have to. The test, I suppose, would be to ask yourself: what would you do if there was an extra half-hour in the day? If your brain immediately rushes to fill that space with an answer (a genuine answer: something you’d actually do – there’s no point lying to yourself and saying you’d spend it at the gym if you wouldn’t!), you’re probably a goldfish. If not, you’re probably a midget.

I think I can name people among my friends who are goldfish, and people who are midgets. But I do wonder what type they would say that they are…

Work Calendar [NSFW?]

In my office at the Bodleian, we’ve got a calendar on which employees mark their annual leave. The theme of the calendar is supposed to be paintings inspired by flowers… but – and maybe it’s just my dirty mind – this month’s image seems just a little bit saucy:

Our calendar this month. That's supposed to be a flower, is it?

Click to embiggen. It can’t be just me that sees… it… right?

Instead Of Blogging…

Things I’ve been doing instead of blogging, this last month, include:

  • Code Week: hacking Three Rings code in a converted hay loft of a Derbyshire farm, as mentioned on the Three Rings blog.
  • Hoghton Tower: as is traditional at this time of year (see blog posts from 2010, 2009, 2005, 2003, for example), went to Preston for the Hoghton Tower concert and fireworks display, accompanied by Ruth, and my sister’s 22nd birthday. My other sister has more to say about it.
  • Family Picnic: Joining Ruth and JTA at Ruth’s annual family picnic, among her billions of second-cousins and third-aunts.
  • New Earthwarming: Having a mini housewarming on New Earth, where I live with Ruth, JTA, and Paul. A surprising number of people came from surprisingly far away, and it was fascinating to see some really interesting networking being done by a mixture of local people (from our various different “circles” down here) and distant guests.
  • Bodleian Staff Summer Party: Yet another reason to love my new employer! The drinks and the hog roast (well, roast vegetable sandwiches and falafel wraps for me, but still delicious) would have won me over by themselves. The band was just a bonus. The ice cream van that turned up and started dispensing free 99s: that was all just icing on the already-fabulous cake.
  • TeachMeet: Giving a 2-minute nanopresentation at the first Oxford Libraries TeachMeet, entitled Your Password Sucks. A copy of my presentation (now with annotations to make up for the fact that you can’t hear me talking over it) has been uploaded to the website.
  • New Earth Games Night: Like Geek Night, but with folks local to us, here, some of whom might have been put off by being called “Geeks”, in that strange way that people sometimes do. Also, hanging out with the Oxford On Board folks, who do similar things on Monday nights in the pub nearest my office.
  • Meeting Oxford Nightline: Oxford University’s Nightline is just about the only Nightline in the British Isles to not be using Three Rings, and they’re right on my doorstep, so I’ve been meeting up with some of their folks in order to try to work out why. Maybe, some day, I’ll actually understand the answer to that question.
  • Alton Towers & Camping: Ruth and I decided to celebrate the 4th anniversary of us getting together with a trip to Alton Towers, where their new ride, Thirteen, is really quite good (but don’t read up on it: it’s best enjoyed spoiler-free!), and a camping trip in the Lake District, with an exhausting but fulfilling trek to the summit of Glaramara.
Setting up camp at Stonethwaite.

That’s quite a lot of stuff, even aside from the usual work/volunteering/etc. stuff that goes on in my life, so it’s little wonder that I’ve neglected to blog about it all. Of course, there’s a guilt-inspired downside to this approach, and that’s that one feels compelled to not blog about anything else until finishing writing about the first neglected thing, and so the problem snowballs.

So this quick summary, above? That’s sort-of a declaration of blogger-bankruptcy on these topics, so I can finally stop thinking “Hmm, can’t blog about X until I’ve written about Code Week!”

First Class Film

Last week, I saw X-Men: First Class at the cinema with Ruth. The film was… pretty mediocre, I’m afraid… but another part of the cinemagoing experience was quite remarkable:

There’s a bit in the film where Xavier, then writing his thesis at Oxford University, and a CIA agent are talking. As they talk, they walk right through the middle of the Bodleian Library, right past my office. It’s not just Morse and Lewis and the Harry Potter films that make use of the Library (at great expense, I gather) for filming purposes! “That’s my office!” I squee’d, pointing excitedly at the screen.

Needless to say, the student-heavy audience cheered loudly at the presence of parts of Oxford that they recognised, too. It’s been a while since I was in a cinema where people actually cheered at what was going on. In fact, the last time will have been in the Commodore Cinema in Aberystwyth. But cinema-culture in Aberystwyth’s strange anyway.

Fonts of the Ancients

“Thanks to these changes,” I said, “The Bodleian Libraries websites CMS can now support the use of Unicode characters. That means that the editors can now write web content in Arabic, Japanese, Russian… or even Ancient Egyptian!”

The well-known "man standing on two giraffes" hieroglyph.

It sounded like a good soundbite for the internal newsletter, although of course I meant that last suggestion as a joke. While I’m aware of libraries within the Bodleian who’d benefit from being able to provide some of their content in non-Latin characters – and Arabic, Japanese, and Russian were obvious candidate languages – I didn’t actually anticipate that mentioning Ancient Egyptian would attract much attention. Everybody knows that’s meant as a joke, right?

Streetlights of the 2nd century BC were powered by enormous slugs.

“Is that just Demotic symbols, then? Or can we use all hieroglyphics?” came back the reply. My heart stopped. Somebody actually wanted to use a four thousand plus year old alphabet to write their web pages?

It turns out that there’s only one font in existence that supports the parts of the Unicode font set corresponding to Egyptian hieroglyphics: Aegyptus. So you need to ensure that your readers have that installed or they’ll just see lots of boxes. And you’ll need to be able to type the characters in the first place – if you don’t have an Ancient Egyptian Keyboard (and who does, these days), you’re going to spend a lot of time clicking on characters from a table or memorising five-digit hex-codes.

Papyrus was important, but the Egyptians' greatest achievement was the invention of crazy golf.

But yes, it’s doable. With a properly set-up web server, database, CMS, and templates, and sufficient motivation, it’s possible to type in Ancient Egyptian. And now, thanks to me, the Bodleian has all of those things.

Well: except perhaps the motivation. The chap who asked about Ancient Egyptian was, in fact, having a laugh. In the strange academic environment of Oxford University, it’s hard to be certain, sometimes.

Crocodiles can easily be caught using sleeping bags.

I do find myself wondering what scribes of the Old Kingdom would have made of this whole exercise. To a scribe, for example, it will have been clear that to express his meaning he needed to draw a flock of three herons facing left. Millenia later, we treat “three herons facing left” as a distinct separate glyph from “one heron facing left”, perhaps in a similar way to the way that we treat the Æ ligature as being separate from the letters A and E from which it is derived. He couldn’t draw just one heron, because… well, that just wouldn’t make any sense, would it? So this symbol – no: more importantly, it’s meaning – is encoded as U+13163, the 78,180th character in an attempted “univeral alphabet”.

Starting step in the creation of "vulture and asp soup".

To what purpose? So that we can continue to pass messages around in Ancient Egyptian in a form that will continue to be human and machine-readable for as long as is possible. But why? That’s what I imagine our scribe would say. We’re talking about a dead language here: one whose continued study is only justified by an attempt to understand ancient texts that we keep digging up. And he’d be right.

All existing texts written in Ancient Egyptian aren’t encoded in Unicode. They’re penned on rotting papyrus and carved into decaying sandstone walls. Sure, we could transcribe them, but we’d get exactly the same amount of data by transliterating them or using an encoding format for that specific purpose (which I’m sure must exist), and even more data by photographing them. There’s no need to create more documents in this ancient language: just to preserve the existing ones for at least as long as it takes to translate and interpret them. So why the effort to make an encoding system – and an associated font! – to display them?

Two-headed snakes: the original skipping rope.

Don’t get me wrong: I approve. I think Unicode is awesome, and I think that UTF-16 and UTF-8 are fantastic (if slightly hacky) ways to make use of the breadth of Unicode without doubling or quadrupling the amount of memory consumed by current 8-bit documents. I just don’t know how to justify it. All of those bits, just to store information in a language in which we’re producing no new information.

What I’m saying is: I think it’s wonderful that we can now put Egyptian hieroglyphics on the Bodleian Libraries websites. I just don’t know how I’d explain why it’s cool to a time-traveling Egyptian scribe. Y’know; in case I come across one.

My New Pet Hate, part II

A few years ago, I talked about a pet hate of mine that still seems to be prevalent: that is – that when people send me a screenshot, they’ll sometimes send me it in a Word document, for no apparent reason. They could just send me the picture, but instead they send me a Word document containing the picture, thereby increasing the file size, requiring that I have a program capable of viewing Word documents, and making it more-complex for me to extract the picture if I need to use it somewhere. And on top of all of that, it takes longer for them to do it this way: everybody loses!

Today, I saw somebody take the abuse of screenshots to a whole new level. My first clue that something was amiss was when the email arrived in my Inbox with a 300K TIFF file in it. “Well, at least it’s not a Word document,” I thought. And I was right. It was something more convoluted than that.

My only explanation for the contents of the file is as follows:

  1. Print Screen. The user took the screenshot using their Print Screen key. So far, so good. They captured their whole screen, rather than just what they were trying to show me, but we’ll let that pass.
  2. Open Paint. The user opened Paint. At this point, they could have pasted, saved, and emailed the file to me, and still been doing perfectly well. But they didn’t.
  3. Resize canvas. The user expanded the canvas to an enormous size. Perhaps they didn’t know that this would be done automatically, if required. Or maybe they thought that I could do with a lot of white space in which to make notes on their screengrab.
  4. Paste and reposition. The user pasted the screenshot into the Paint document, and positioned it near the centre, making sure to leave as much whitespace as possible. Y’know, in case I was running out of it on my computer. They could still at this point have just saved the file and emailed it to me, and I wouldn’t have complained.
  5. Print Screen again. For some reason, the user pressed Print Screen again at this point, thereby taking a screenshot of themselves manipulating a screenshot that they’d already taken. Maybe the user has recently watched Inception, and decided that “a screenshot within a screenshot” was more likely to make an impact on me. We need to go deeper!
  6. Open Photoshop. Paint obviously wasn’t going to cut it: it was time for a bigger graphics program. The user opened up Photoshop (waiting for a few minutes while this beast of a program warmed up).
  7. Create a new document and paste again. Now the user had Photoshop open, containing a picture of Paint being used to display an (oversized) screenshot of what they wanted to show me.
  8. Crop. This was a good idea. If the user had cropped the image all the way back down to the screenshot, I might not even have worked out what they were doing. Sadly, they didn’t. They cropped off Paint’s title bar and half of its toolbar. Then they added another few layers of whitespace to the bottom and right, just to be really sure.
  9. Save as a TIFF. They could have saved as a PNG. Or a GIF. Even a JPEG. They could have saved as a PSD. But no, for some reason, an uncompressed TIFF was the way forwards.
I N C E P T I O N. A screenshot of a screenshot within a screenshot.

Back in 2009, I predicted that Windows Vista/7’s new “Snipping Tool”, which finally brought screen captures to the level of more-competent operating systems, would see the end of this kind of nonsense. Unfortunately, Windows XP remains the standard at my workplace, so I doubt that this’ll be the last time that I see “matryoshka screenshots”.

Idiocy Repeats Itself

Two years and one month ago to this day, I made an idiot out of myself by injuring myself while chasing cake. Back then, of course, I was working on the top floor of the Technium in Aberystwyth, and I was racing down the stairs of the fire escape in an attempt to get to left-over cake supplies before they were picked clean by the other scavengers in the office building. I tripped and fell, and sprained by ankle quite badly (I ended up on crutches for a few days).

Last week, history almost repeated itself, and I’m not even talking about my recent head injury. Again, I’m on the top floor of a building, and again, there’s a meeting room on the bottom floor (technically in the basement, but that only means there’s further to go). When I got the email, I rushed out of the door and down the stairwell, skipping over the stairs in threes and fours. Most of the Bodleian’s stairwells are uncarpeted wood, and the worn-down soles of my shoes skidded across them.

The prize! Baskets of fresh sandwiches (fruit, but not cakes, are off-camera: around here, cakes go very quickly...)

You’d think I’d have learned by now, but apparently I’m a little slow. Slow, except at running down stairs. As I rounded the corner of the last stairwell, my body turned to follow the route but my feet kept going in the same direction. They took flight, and for a moment I was suspended in the air, like a cartoon character before they realise their predicament and gravity takes hold. With a thud, I hit the ground.

Perhaps I’d learned something, though, because at least this time around I rolled. Back on my feet, I was still able to get to the meeting room and scoff the best of the fruit and sandwiches before anybody else arrived.

Is this really worthy of a blog post? Dan doesn’t have an accident is hardly remarkable (although perhaps a little more noteworthy than I’d like to admit, based on recent experience). Well, I thought so. And I’ve got a free lunch. And I didn’t have to hurt myself to do so. Which is probably for the best: based on the number of forms I had to fill out to get root access on the systems I administer, I don’t want to think how complicated the accident book must be…