A personal favourite of mine are the Herculaneum Papyri. These charred scrolls were part of a private library near Pompeii that was buried by the eruption
of Mount Vesuvius in 79 CE. Rediscovered from 1752, these ~1,800 scrolls were distributed to academic institutions around the world, with
the majority residing in Naples’ Biblioteca Nazionale Vittorio Emanuele III.
As you might expect of ancient scrolls that got buried, baked, and then left to rot, they’re pretty fragile. That didn’t stop Victorian era researchers trying a variety of techniques to
gently unroll them and read what was inside.
Like many others, what I love about the Herculaneum Papyri is the air of mystery. Each could be anything from a lost religious text to, I don’t know, somebody’s to-do list (“buy milk, arrange for annual service of chariot, don’t forget to renew
In recent years, we’ve tried “virtually unrolling” the scrolls using a variety of related technologies. And – slowly – we’re getting there.
So imagine my delight when this week, for the first time ever, a
complete word was extracted from one of the carbonised, still-rolled-up scrolls from Herculaneum. Something that would have seemed inconceivable to the historians who first
discovered and catalogued the scrolls is now possible, thanks to their careful conservation over the years along with the steady advance of technology.
Anyway, I thought that was exciting news so I wanted to share.
I’ve been working in Witney one day every week or two lately, but somehow I’ve never managed to sync up my work times with the hours that this building is accessible! Or, when I do, I’m
in a hurry and don’t have time to stop and hunt!
This morning, though, the stars aligned and I was able to get to the GZ. The cache was pretty much where I expected based on the
coordinates and the hint, but still took a minute out two to lay hands on. Soon, though, I was quietly sitting and reading past log entries.
You know that strange moment when you see your old coworkers on YouTube doing a cover of an Adam and the Ants song? No: just me?
Still good to see the Bodleian put a fun spin on promoting their lockdown-friendly reader services. For some reason they’ve marked this video “not embeddable” (?) in their YouTube
settings, so I’ve “fixed” the copy above for you.
My team and I do get up to some unusual stuff, it’s true. I took part in this photoshoot, too:
I’m absolutely not above selling out myself and my family for the benefit of some stock photos for the University, it seems. The sharp-eyed might even have spotted the kids in this video promoting the Ashmolean or a recent tweet by the Bodleian…
Have you noticed how the titles printed on the spines of your books are all, for the most part, oriented the same way? That’s not a coincidence.
ISO 6357 defines the standard positioning of titles on the spines of printed books (it’s also
codified as British Standard BS6738). If you assume that your book is stood “upright”, the question is one of which way you tilt your head to read the title printed on the spine. If you
tilt your head to the right, that’s a descending title (as you read left-to-right, your gaze moves down, towards the surface on which the book stands). If you tilt your head to
the left, that’s an ascending title. If you don’t need to tilt your head in either direction, that’s a transverse title.
The standard goes on to dictate that descending titles are to be favoured: this places the title in a readable orientation when the book lays flat on a surface with the cover
face-up. Grab the nearest book to you right now and you’ll probably find that it has a descending title.
But if the book is lying on a surface, I can usually read the cover of the book. Only if a book is in a stack am I unable to do so, and stacks are usually relatively short and
so it’s easy enough to lift one or more books from the top to see what lies beneath. What really matters when considering the orientation of a spine title is, in my mind, how it appears
when it’s shelved.
It feels to me like this standard’s got things backwards. If a shelf of anglophone books is organised into any kind of order (e.g. alphabetically) then it’ll usually be from left to
right. If I’m reading the titles from left to right, and the spines are printed descending, then – from the perspective of my eyes – I’m reading from bottom to top:
It’s possible that this is one of those things that I overthink.
The team responsible for digital archiving had plans to spend World Digital Preservation Day running a stand in Blackwell Hall for some
time before I got involved. They’d asked my department about using the Heritage Window – the Bodleian’s 15-screen video wall – to show a carousel of slides with relevant content over
the course of the day. Or, they added, half-jokingly, “perhaps we could have Pong up there as it’ll be its 46th birthday?”
But I didn’t take it as a joke. I took it as a challenge.
Emulating Pong is pretty easy. Emulating Pong perfectly is pretty hard. Indeed, a lot of the challenge in the preservation of (especially digital) archives in general is in
finding the best possible compromise in situations where perfect preservation is not possible. If these 8″ disks are degrading, is is acceptable to copy them onto a different medium? If this video file is unreadable in
modern devices, is it acceptable to re-encode it in a contemporary format? These are the kinds of questions that digital preservation specialists have to ask themselves all the damn
Emulating Pong in a way that would work on the Heritage Window but be true to the original raised all kinds of complications. (Original) Pong’s aspect ratio doesn’t fit nicely on a 16:9
widescreen, much less on a 27:80 ultrawide. Like most games of its era, the speed is tied to the clock rate of the processor. And of course, it should be controlled using a
By the time I realised that there was no way that I could thoroughly replicate the experience of the original game, I decided to take a different track. Instead, I opted to
reimplement Pong. A reimplementation could stay true to the idea of Pong but serve as a jumping-off point for discussion about how the experience of playing the game
may be superficially “like Pong” but that this still wasn’t an example of digital preservation.
Here’s the skinny:
A web page, displayed full-screen, contains both a <canvas> (for the game, sized appropriately for a 3 × 3 section of the video wall) and a
<div> full of “slides” of static content to carousel alongside (filling a 2 × 3 section).
Gamepad API (which is awesome, by the way). If there’s only one player, a (tough! – only three people managed to beat it over the course of the day!) AI plays the other paddle.
A pair of SNES controllers adapted for use as USB
controllers which I happened to own already.
I felt that the day, event, and game were a success. A few dozen people played Pong and explored the other technology on display. Some got nostalgic about punch tape, huge floppy disks,
and even mechanical calculators. Many more talked to the digital archives folks and I about the challenges and importance of digital archiving. And a good time was had by all.
I’ve open-sourced the entire thing with a super-permissive license so you can deploy it yourself (you know, on your ultrawide
video wall) or adapt it as you see fit. Or if you’d just like to see it for yourself on your own computer, you can (but unless
you’re using a 4K monitor you’ll probably need to use your browser’s mobile/responsive design simulator set to 3200 × 1080 to make it fit your screen). If you don’t have
controllers attached, use W/S to control player 1 and the cursor keys for player 2 in a 2-player game.
When I first started working at the Bodleian Libraries in 2011, their websites were looking… a little
dated. I’d soon spend some time working with a vendor (whose premises mysteriously caught fire while I was there, freeing me up to spend my
birthday in a bar) to develop a fresh, modern interface for our websites that, while not the be-all and end-all, was a huge leap forwards and has served us well for the last five years
Fast-forward a little: in about 2015 we noticed a few strange anomalies in our Google Analytics data. For some reason, web addresses were appearing that didn’t exist anywhere on our
site! Most of these resulted from web visitors in Turkey, so we figured that some Turkish website had probably accidentally put our Google Analytics user ID number into their
code rather than their own. We filtered out the erroneous data – there wasn’t much of it; the other website was clearly significantly less-popular than ours – and carried on. Sometimes
we’d speculate about the identity of the other site, but mostly we didn’t even think about it.
Earlier this year, there was a spike in the volume of the traffic we were having to filter-out, so I took the time to investigate more-thoroughly. I determined that the offending
website belonged to the Library of Bilkent University, Turkey. I figured that some junior web developer there must have copy-pasted the
Bodleian’s Google Analytics code and forgotten to change the user ID, so I went to the website to take a look… but I was in for an even bigger surprise.
Whoah! The web design of a British university was completely ripped-off by a Turkish university! Mouth agape at the audacity, I clicked my way through several of their pages to try to
understand what had happened. It seemed inconceivable that it could be a coincidence, but perhaps it was supposed to be more of an homage than a copy-paste job? Or perhaps they
were ripped-off by an unscrupulous web designer? Or maybe it was somebody on the “inside”, like our vendor, acting unethically by re-selling the same custom design? I didn’t believe it
could be any of those things, but I had to be sure. So I started digging…
I was almost flattered as I played this spot-the-difference competition, until I saw the copyright notice: stealing our design was galling enough, but then relicensing it in such a way
that they specifically encourage others to steal it too was another step entirely. Remember that we’re talking about an academic library, here: if anybody ought to
have a handle on copyright law then it’s a library!
I took a dive into the source code to see if this really was, as it appeared to be, a copy-paste-and-change-the-name job (rather than “merely” a rip-off of the entire graphic design),
and, sure enough…
It looks like they’d just mirrored the site and done a search-and-replace for “Bodleian”, replacing it with “Bilkent”. Even the code’s spelling errors, comments, and indentation were
intact. The CSS was especially telling (as well as being chock-full of redundant code relating to things that appear on our website but not on theirs)…
So I reached out to them with a tweet:
I didn’t get any response, although I did attract a handful of Turkish followers on Twitter. Later, they changed their Twitter handle and I thought I’d take advantage of the then-new
capability for longer tweets to have another go at getting their attention:
Clearly this was what it took to make the difference. I received an email from the personal email account of somebody claiming to be Taner
Korkmaz, Systems Librarian with Bilkent’s Technical Services team. He wrote (emphasis mine):
Dear Mr. Dan Q,
My name is Taner Korkmaz and I am the systems librarian at Bilkent. I am writing on behalf of Bilkent University Library, regarding your share about Bilkent on
your Twitter account.
Firstly, I would like to explain that there is no any relation between your tweet and our library Twitter handle change. The librarian who is Twitter admin at Bilkent did not notice
your first tweet. Another librarian took this job and decided to change the twitter handle because of the Turkish letters, abbreviations, English name requirement etc. The first name
was @KutphaneBilkent (kutuphane means library in Turkish) which is not clear and not easy to understand. Now, it is @LibraryBilkent.
About 4 years ago, we decided to change our library website, (and therefore) we reviewed the appearance and utility of the web pages.
We appreciated the simplicity and clarity of the user interface of University of Oxford Bodlien Library & Radcliffe Camera, as an academic pioneer in many fields. As a not profit institution, we took advantage of your template by using CSS and HTML, and added our own original content.
We thought it would not create a problem the idea of using CSS codes since on the web page there isn’t any license notice or any restriction related to
the content of the template, and since the licenses on the web pages are mainly more about content rather than templates.
The Library has its own Google Analytics and Search Console accounts and the related integrations for the web site statistical data tracking. We would like to point out that there is
a misunderstanding regarding this issue.
In 2017, we started to work on creating a new web page and we will renew our current web page very soon.
Thank you in advance for your attention to this matter and apologies for possible inconveniences.
Or to put it another way: they decided that our copyright notice only applied to our content and not our design and took a copy of the latter.
Do you remember when I pointed out earlier that librarians should be expected to know their way around copyright law? Sigh.
They’ve now started removing evidence of their copy-pasting such as the duplicate Google Analytics code fragment and the references to LibraryData, but you can still find the unmodified
code via archive.org, if you like.
That probably ends my part in this little adventure, but I’ve passed everything on to the University of Oxford’s legal team in case any of them have anything to say about it. And now
I’ve got a new story to tell where web developers get together over a pint: the story of the time that I made a website for a university… and a different university stole it!
Respect des fonds: A principle in archival theory that proposes to group collections of archival records according to their
fonds — that is to say, according to the administration, organization, individual, or entity by which they were created or from which they were received. (Ed Summers)
Realia: Objects and material from everyday life. (Deb Chachra)
Last month I got the opportunity to attend the EEBO-TCP Hackfest,
hosted in the (then still very-much under construction) Weston Library at my workplace. I’ve
done a couple of hackathons and similar get-togethers before, but this one was somewhat different in that it was unmistakably geared towards a different kind of geek than the
technology-minded folks that I usually see at these things. People like me, with a computer science background, were remarkably in the minority.
Instead, this particular hack event attracted a great number of folks from the humanities end of the spectrum. Which is understandable, given its theme: the Early English Books Online
Text Creation Partnership (EEBO-TCP) is an effort to digitise and make available in marked-up, machine-readable text formats a huge corpus of English-language books printed between 1475
and 1700. So: a little over three centuries of work including both household names (like Shakespeare, Galileo, Chaucer, Newton, Locke, and Hobbes) and an enormous number of others that
you’ll never have heard of.
The hackday event was scheduled to coincide with and celebrate the release of the first 25,000 texts into the public domain, and attendees were challenged to come up with ways to use
the newly-available data in any way they liked. As is common with any kind of hackathon, many of the attendees had come with their own ideas half-baked already, but as for me: I had no
idea what I’d end up doing! I’m not particularly familiar with the books of the 15th through 17th centuries and I’d never looked at the way in which the digitised texts had been
encoded. In short: I knew nothing.
Instead, I’d thought: there’ll be people here who need a geek. A major part of a lot of the freelance work I end up doing (and a lesser part of my work at the Bodleian, from
time to time) involves manipulating and mining data from disparate sources, and it seemed to me that these kinds of skills would be useful for a variety of different conceivable
I paired up with a chap called Stephen Gregg, a lecturer in 18th century literature from Bath Spa University. His idea
was to use this newly-open data to explore the frequency (and the change in frequency over the centuries) of particular structural features in early printed fiction: features like
chapters, illustrations, dedications, notes to the reader, encomia, and so on). This proved to be a perfect task for us to pair-up on, because he had the domain knowledge to ask
meaningful questions, and I had the the technical knowledge to write software that could extract the answers from the data. We shared our table with another pair, who had
technically-similar goals – looking at the change in the use of features like lists and tables (spoiler: lists were going out of fashion, tables were coming in, during the 17th century)
in alchemical textbooks – and ultimately I was able to pass on the software tools I’d written to them to adapt for their purposes, too.
And here’s where I made a discovery: the folks I was working with (and presumably academics of the humanities in general) have no idea quite how powerful data mining tools could be in
giving them new opportunities for research and analysis. Within two hours we were getting real results from our queries and were making amendments and refinements in our questions and
trying again. Within a further two hours we’d exhausted our original questions and, while the others were writing-up their findings in an attractive way, I was beginning to look at how
the structural differences between fiction and non-fiction might be usable as a training data set for an artificial intelligence that could learn to differentiate between the two,
providing yet more value from the dataset. And all the while, my teammates – who’d been used to looking at a single book at a time – were amazed by the possibilities we’d uncovered for
training computers to do simple tasks while reading thousands at once.
Elsewhere at the hackathon, one group was trying to simulate the view of the shelves of booksellers around the old St. Paul’s Cathedral, another looked at the change in the popularity
of colour and fashion-related words over the period (especially challenging towards the beginning of the timeline, where spelling of colours was less-standardised than towards the end),
and a third came up with ways to make old playscripts accessible to modern performers.
At the end of the session we presented our findings – by which I mean, Stephen explained what they meant – and talked about the technology and its potential future impact – by which I
mean, I said what we’d like to allow others to do with it, if they’re so-inclined. And I explained how I’d come to learn over the course of the day what the word encomium meant.
My personal favourite contribution from the event was by Sarah Cole, who adapted the text of a story about a witch
trial into a piece of interactive fiction, powered by Twine/Twee, and then
allowed us as an audience to collectively “play” her game. I love the idea of making old artefacts more-accessible to modern audiences through new media, and this was a fun and
innovative way to achieve this. You can even play her game
But while that was clearly my favourite, the judges were far more impressed by the work of my teammate and I, as well as the team who’d adapted my software and used it to investigate
different features of the corpus, and decided to divide the cash price between the four of us. Which was especially awesome, because I hadn’t even realised that there was a
prize to be had, and I made the most of it at the Drinking About Museums event
I attended later in the day.
If there’s a moral to take from all of this, it’s that you shouldn’t let your background limit your involvement in “hackathon”-like events. This event was geared towards literature,
history, linguistics, and the study of the book… but clearly there was value in me – a computer geek, first and foremost – being there. Similarly, a hack event I attended last year, while clearly tech-focussed, wouldn’t have
been as good as it was were it not for the diversity of the attendees, who included a good number of artists and entrepreneurs as well as the obligatory hackers.
But for me, I think the greatest lesson is that humanities researchers can benefit from thinking a little bit like computer scientists, once in a while. The code I wrote (which uses Ruby and Nokogiri) is freely available for use and adaptation, and while I’ve no idea whether or not it’ll ever be useful to anybody again, what
it represents is the research benefits of inter-disciplinary collaboration. It pleases me to see things like the “Library Carpentry” (software for research, with a
library slant) seeming to take off.
In particular, something I’ve been working on are the QR codes. This
experiment – very progressive for a sometimes old-fashioned establishment like the Bodleian – involves small two-dimensional barcodes being placed with the exhibits. The barcodes are
embedded with web addresses for each exhibit’s page on the exhibition website. Visitors who scan them – using a tablet computer, smartphone, or whatever – are directed to a web page
where they can learn more about the item in front of them and can there discuss it with other visitors or can “vote” on it: another exciting new feature in this exhibition is that we’re
trying quite hard to engage academics and the public in debate about the nature of “treasures”: what is a treasure?
We hope that the visual association between each artefact and its QR code will help to make it clear that the code is related to the item (and isn’t, for example, some kind of asset tag
for the display case or something). We’re going to be monitoring usage of the codes, so hopefully we’ll get some meaningful results that could be valuable for future exhibitions: or for
other libraries and museums.
Rolling Your Own
If you’re interested in making your own QR codes with artistic embellishment (and I’m sure a graphic designer could do a far better job than I did!), here’s my approach:
I used Google Infographics (part of Chart Tools) to produce my QR codes. It’s fast, free,
simple, and – crucially – allows control over the level of error correction used in the resulting code. Here’s a sample URL to generate the QR code above:
500×500 is the size of the QR code. I was ultimately producing 5cm codes because our experiments showed that this was about the right size
for our exhibition cabinets, the distance from which people would be scanning them, etc. For laziness, then, I produced codes 500 pixels square at a resolution of 100 pixels per
H specifies that we want to have an error-correction level of 30%, the maximum possible. In theory, at least, this allows us to do the maximum
amount of “damage” to our QR code, by manipulating it, and still have it work; you could try lower levels if you wanted, and possibly get less-complex-looking codes.
0 is the width of the border around the QR code. I didn’t want a border (as I was going to manipulate the code in Photoshop anyway), so I use
a width of 0.
The URL – HTTP://TREASURES.BODLEIAN.OX.AC.UK/T7 – is presented entirely in capitals. This is because capital letters use fewer bits
when encoded as QR codes. “http” and domain names are case-insensitive anyway, and we selected our QR code path names to be in capitals. We also shortened the URL as far as possible:
owing to some complicated technical and political limitations, we weren’t able to lean on URL-shortening services like bit.ly, so we had to roll our own. In
hindsight, it’d have been nice to have set up the subdomain “t.bodleian.ox.ac.uk”, but this wasn’t possible within the time available. Remember: the shorter the web address, the simpler
the code, and simpler codes are easier and faster to read.
Our short URLs redirect to the actual web pages of each exhibit, along with an identifying token that gets picked up by Google Analytics to track how widely the QR codes are being used (and which ones are most-popular amongst visitors).
Load that code up in Photoshop, along with the image you’d like to superimpose into it. Many of the images I’ve had to work with are disturbingly “square”, so I’ve simply taken
them, given them a white or black border (depending on whether they’re dark or light-coloured). With others, though, I’ve been able to cut around some of the more-attractive parts of
the image in order to produce something with a nicer shape to it. In any case, put your image in as a layer on top of your QR code.
Move the image around until you have something that’s aesthetically-appealing. With most of my square images, I’ve just plonked them in the middle and resized them to cover a whole
number of “squares” of the QR code. With the unusually-shaped ones, I’ve positioned them such that they fit in with the pattern of the QR code, somewhat, then I’ve inserted another
layer in-between the two and used it to “white out” the QR codes squares that intersect with my image, giving a jagged, “cut out” feel.
Test! Scan the QR code from your screen, and again later from paper, to make sure that it’s intact and functional. If it’s not, adjust your overlay so that it covers less of the QR
code. Test in a variety of devices. In theory, it should be possible to calculate how much damage you can cause to a QR code before it stops working (and where it’s safe to cause the
damage), but in practice it’s faster to use trial-and-error. After a while, you get a knack for it, and you almost feel as though you can see where you need to put the images so that
they just-barely don’t break the codes. Good luck!
Give it a go! Make some QR codes that represent your content (web addresses, text, vCards, or whatever) and embed your own images into them to make them stand out with a style of their