Happy Earth Day, /r/MegaEarth!
This self-post was originally posted to /r/megaearth. See more things from Dan's Reddit account.
This self-post was originally posted to /r/megaearth. See more things from Dan's Reddit account.
Last month I got the opportunity to attend the EEBO-TCP Hackfest, hosted in the (then still very-much under construction) Weston Library at my workplace. I’ve done a couple of hackathons and similar get-togethers before, but this one was somewhat different in that it was unmistakably geared towards a different kind of geek than the technology-minded folks that I usually see at these things. People like me, with a computer science background, were remarkably in the minority.
Instead, this particular hack event attracted a great number of folks from the humanities end of the spectrum. Which is understandable, given its theme: the Early English Books Online Text Creation Partnership (EEBO-TCP) is an effort to digitise and make available in marked-up, machine-readable text formats a huge corpus of English-language books printed between 1475 and 1700. So: a little over three centuries of work including both household names (like Shakespeare, Galileo, Chaucer, Newton, Locke, and Hobbes) and an enormous number of others that you’ll never have heard of.
The hackday event was scheduled to coincide with and celebrate the release of the first 25,000 texts into the public domain, and attendees were challenged to come up with ways to use the newly-available data in any way they liked. As is common with any kind of hackathon, many of the attendees had come with their own ideas half-baked already, but as for me: I had no idea what I’d end up doing! I’m not particularly familiar with the books of the 15th through 17th centuries and I’d never looked at the way in which the digitised texts had been encoded. In short: I knew nothing.
Instead, I’d thought: there’ll be people here who need a geek. A major part of a lot of the freelance work I end up doing (and a lesser part of my work at the Bodleian, from time to time) involves manipulating and mining data from disparate sources, and it seemed to me that these kinds of skills would be useful for a variety of different conceivable projects.
I paired up with a chap called Stephen Gregg, a lecturer in 18th century literature from Bath Spa University. His idea was to use this newly-open data to explore the frequency (and the change in frequency over the centuries) of particular structural features in early printed fiction: features like chapters, illustrations, dedications, notes to the reader, encomia, and so on). This proved to be a perfect task for us to pair-up on, because he had the domain knowledge to ask meaningful questions, and I had the the technical knowledge to write software that could extract the answers from the data. We shared our table with another pair, who had technically-similar goals – looking at the change in the use of features like lists and tables (spoiler: lists were going out of fashion, tables were coming in, during the 17th century) in alchemical textbooks – and ultimately I was able to pass on the software tools I’d written to them to adapt for their purposes, too.
And here’s where I made a discovery: the folks I was working with (and presumably academics of the humanities in general) have no idea quite how powerful data mining tools could be in giving them new opportunities for research and analysis. Within two hours we were getting real results from our queries and were making amendments and refinements in our questions and trying again. Within a further two hours we’d exhausted our original questions and, while the others were writing-up their findings in an attractive way, I was beginning to look at how the structural differences between fiction and non-fiction might be usable as a training data set for an artificial intelligence that could learn to differentiate between the two, providing yet more value from the dataset. And all the while, my teammates – who’d been used to looking at a single book at a time – were amazed by the possibilities we’d uncovered for training computers to do simple tasks while reading thousands at once.
Elsewhere at the hackathon, one group was trying to simulate the view of the shelves of booksellers around the old St. Paul’s Cathedral, another looked at the change in the popularity of colour and fashion-related words over the period (especially challenging towards the beginning of the timeline, where spelling of colours was less-standardised than towards the end), and a third came up with ways to make old playscripts accessible to modern performers.
At the end of the session we presented our findings – by which I mean, Stephen explained what they meant – and talked about the technology and its potential future impact – by which I mean, I said what we’d like to allow others to do with it, if they’re so-inclined. And I explained how I’d come to learn over the course of the day what the word encomium meant.
My personal favourite contribution from the event was by Sarah Cole, who adapted the text of a story about a witch trial into a piece of interactive fiction, powered by Twine/Twee, and then allowed us as an audience to collectively “play” her game. I love the idea of making old artefacts more-accessible to modern audiences through new media, and this was a fun and innovative way to achieve this. You can even play her game online!
(by the way: for those of you who enjoy my IF recommendations: have a look at Detritus; it’s a delightful little experimental/experiential game)
But while that was clearly my favourite, the judges were far more impressed by the work of my teammate and I, as well as the team who’d adapted my software and used it to investigate different features of the corpus, and decided to divide the cash price between the four of us. Which was especially awesome, because I hadn’t even realised that there was a prize to be had, and I made the most of it at the Drinking About Museums event I attended later in the day.
If there’s a moral to take from all of this, it’s that you shouldn’t let your background limit your involvement in “hackathon”-like events. This event was geared towards literature, history, linguistics, and the study of the book… but clearly there was value in me – a computer geek, first and foremost – being there. Similarly, a hack event I attended last year, while clearly tech-focussed, wouldn’t have been as good as it was were it not for the diversity of the attendees, who included a good number of artists and entrepreneurs as well as the obligatory hackers.
But for me, I think the greatest lesson is that humanities researchers can benefit from thinking a little bit like computer scientists, once in a while. The code I wrote (which uses Ruby and Nokogiri) is freely available for use and adaptation, and while I’ve no idea whether or not it’ll ever be useful to anybody again, what it represents is the research benefits of inter-disciplinary collaboration. It pleases me to see things like the “Library Carpentry” (software for research, with a library slant) seeming to take off.
And yeah, I love a good hackathon.
Update 2015-04-22 11:59: with thanks to Sarah for pointing me in the right direction, you can play the witch trial game in your browser.
This link was originally posted to /r/todayilearned. See more things from Dan's Reddit account.
The original link was: http://www.nationalmotormuseum.org.uk/motoring_firsts
Walter Arnold of East Peckham, Kent, had the dubious honour of being the first person in Great Britain to be successfully charged with speeding on 28 January 1896. Travelling at approximately 8mph/12.87kph, he had exceeded the 2mph/3.22kph speed limit for towns. Fined one shilling and costs, Arnold had been caught by a policeman who had given chase on a bicycle.
This link was originally posted to /r/oxford. See more things from Dan's Reddit account.
The original link was: http://digitaloxford.com/event/rabbit-hole-alice-wonderland-themed-digital-hack-event-hosted-story-museum/
This link was originally posted to /r/todayilearned. See more things from Dan's Reddit account.
The original link was: https://web.archive.org/web/20120420030827/http://www.articleblast.com/School_and_Education/General/The_History_of_the_Tea_Cosy/
This link was originally posted to /r/StarWars. See more things from Dan's Reddit account.
The original link was: http://www.theguardian.com/careers/careers-blog/2015/apr/16/was-yoda-medieval-monk-museum-curator
This link was originally posted to /r/yoda. See more things from Dan's Reddit account.
The original link was: http://i.guim.co.uk/static/w-620/h--/q-95/sys-images/Guardian/Pix/pictures/2015/4/15/1429106856863/5cd4ecfc-1bad-4bbf-89a9-7137a3bb7d97-bestSizeAvailable.jpeg
This checkin to GC54KVD Oxford Medical History #2: Grey matter reflects a geocaching.com log entry. See more of Dan's cache logs.
Cache is missing! I helped the CO hide this cache, and, following the recent spate of DNFs, decided to come check on it… It’s definitely been muggled and needs replacement!
This checkin to GLH77F8M H.P. A reflects a geocaching.com log entry. See more of Dan's cache logs.
Completing today’s tour of Holland Park and the end of my final day’s lunch break in Shepherd’s Bush, I got the opportunity to find this one last cache. The container’s smell gives away its former life! TFTC!
This checkin to GLH77BJ5 H.P. C reflects a geocaching.com log entry. See more of Dan's cache logs.
Found while between waypoints while collecting clues for GCVPR3. Nice location and a good container. Had to choose my moment to grab and sign while the muggles were distracted by a peacock jumping the fence!
This checkin to GLH77EPM ????, ????, What have we here? reflects a geocaching.com log entry. See more of Dan's cache logs.
A wonderful tour of Holland Park finishing with a fabulous cache (and a well-earned Favourite Point). So nice to see a well-designed cache in a beautiful urban park. I’m only in London for a few days, staying in Shepherd’s Bush for a work-related training course, but I’ve been trying to get out on my lunch break to make the most of the spring weather. This is likely to be my penultimate cache of the trip, but the one I’ve enjoyed the most. Thanks!
This is a repost promoting content originally published elsewhere. See more things Dan's reposted.
This checkin to GLH778VY Queen at the Bay reflects a geocaching.com log entry. See more of Dan's cache logs.
GPSr was about 10m off, which it turns out was all it took to start my search in entirely the wrong place. After a fruitless search, I finally thought to try the other obvious place, and lo and behold that’s where it was! Felt very exposed, surrounded my muggle tourists as I returned it! TFTC.
This checkin to GLH6FVRR H.P. C reflects a geocaching.com log entry. See more of Dan's cache logs.
After my GPSr started playing ball, a nice easy find – third of my lunch break! TFTC!
This checkin to GLH6J1M9 Notting Hill Gate. reflects a geocaching.com log entry. See more of Dan's cache logs.
Nice container! An easy find, but there’s some girl who’s looking at me funny as I reassemble the cache! TFTC! FP awarded.