Together with a friend I recently built Dropshare Cloud. We offer online storage for the file and screenshot sharing app Dropshare for macOS/iOS. After trying out Django for getting started (we both had some experience using
Django) I decided to rewrite the codebase in Rails. My past experience developing in Rails made the process quick — and boring…
Those who know me well know that I’m a bit of a data nerd. Even when I don’t yet know what I’m going to do with some data yet, it feels sensible to start collecting it in a
nice machine-readable format from the word go. Because you never know, right? That’s how I’m able to tell you how much gas and electricity our house used on average on any day in the
last two and a half years (and how much off that was offset by our solar panels).
The red lumps are winters, when the central heating comes on and starts burning a stack of gas.
So it should perhaps come as no huge surprise that for the last six months I’ve been recording the identity of every piece of music played by my favourite local radio station,
Jack FM (don’t worry: I didn’t do this by hand – I wrote a
program to do it). At the time, I wasn’t sure whether there was any point to the exercise… in fact, I’m still not sure. But hey: I’ve got a log of the last 45,000 songs
that the radio station played: I might as well do something with it. The Discogs API proved invaluable in automating the discovery of
metadata relating to each song, such as the year of its release (I wasn’t going to do that by hand either!), and that gave me enough data to, for example, do this (click on any image to
see a bigger version):
Decade frequency by hour: you’ve got a good chance of 80s music at any time, but lunchtime’s your best bet (or perhaps just after midnight). Note that times are in UTC+2 in this
graph.
I almost expected a bigger variance by hour-of-day, but I guess that Jack isn’t in the habit of pandering to its demographics too heavily. I spotted the post-midnight point at which you
get almost a plurality of music from 1990 or later, though: perhaps that’s when the young ‘uns who can still stay up that late are mostly listening to the radio? What about by
day-of-week, then:
Even less in it by day of week… although 70s music fans should consider tuning in on Fridays, apparently, and 80s fans will be happiest on Sundays.
The chunks of “bonus 80s” shouldn’t be surprising, I suppose, given that the radio station advertises that that’s
exactly what it does at those times. But still: it’s reassuring to know that when a radio station claims to play 80s music, you don’t just have to take their word for it
(so long as their listeners include somebody as geeky as me).
It feels to me like every time I tune in they’re playing an INXS song. That can’t be a coincidence, right? Let’s find out:
One in every ten songs are by just ten artists (including INXS). One in every four are by just 34 artists.
Yup, there’s a heavy bias towards Guns ‘n’ Roses, Michael Jackson, Prince, Oasis, Bryan Adams, Madonna, INXS, Bon Jovi, Queen, and U2 (who collectively are responsible for over a tenth
of all music played on Jack FM), and – to a lesser extent – towards Robert Palmer, Meatloaf, Blondie, Green Day, Texas, Whitesnake, the Pet Shop Boys, Billy Idol, Madness, Rainbow,
Elton John, Bruce Springsteen, Aerosmith, Fleetwood Mac, Phil Collins, ZZ Top, AC/DC, Duran Duran, the Police, Simple Minds, Blur, David Bowie, Def Leppard, and REM: taken together, one
in every four songs played on Jack FM is by one of these 34 artists.
Amazingly, the most-played song on Jack FM (Alice Cooper’s “Poison”) is not by one of the most-played 34 artists.
I was interested to see that the “top 20 songs” played on Jack FM these last six months include several songs by artists who otherwise aren’t represented at all on the station. The
most-played song is Alice Cooper’s Poison, but I’ve never recorded them playing any other Alice Cooper songs (boo!). The fifth-most-played song is Fight For Your
Right, by the Beastie Boys, but that’s the only Beastie Boys song I’ve caught them playing. And the seventh-most-played – Roachford’s Cuddly Toy – is similarly the only
Roachford song they ever put on.
Next I tried a Markov chain analysis. Markov chains are a mathematical tool that examines a sequence (in this case, a sequence
of songs) and builds a map of “chains” of sequential songs, recording the frequency with which they follow one another – here’s a great
explanation and playground. The same technique is used by “predictive text” features on your smartphone: it knows what word to suggest you type next based on the patterns of words
you most-often type in sequence. And running some Markov chain analysis helped me find some really… interesting patterns in the playlists. For example, look at the similarities between
what was played early in the afternoon of Wednesday 19 October and what was played 12 hours later, early in the morning of Thursday 20 October:
19 October 2016
20 October 2016
12:06:33
Kool & The Gang – Fresh
Kool & The Gang – Fresh
00:13:56
12:10:35
Bruce Springsteen – Dancing In The Dark
Bruce Springsteen – Dancing In The Dark
00:17:57
12:14:36
Maxi Priest – Close To You
Maxi Priest – Close To You
00:21:59
12:22:38
Van Halen – Why Can’t This Be Love
Van Halen – Why Can’t This Be Love
00:25:00
12:25:39
Beats International / Lindy – Dub Be Good To Me
Beats International / Lindy – Dub Be Good To Me
00:29:01
12:29:40
Kasabian – Fire
Kasabian – Fire
00:33:02
12:33:42
Talk Talk – It’s My Life
Talk Talk – It’s My Life
00:38:04
12:41:44
Lenny Kravitz – Are You Gonna Go My Way
Lenny Kravitz – Are You Gonna Go My Way
00:42:05
12:45:45
Shalamar – I Can Make You Feel Good
Shalamar – I Can Make You Feel Good
00:45:06
12:49:47
4 Non Blondes – What’s Up
4 Non Blondes – What’s Up
00:50:07
12:55:49
Madness – Baggy Trousers
Madness – Baggy Trousers
00:54:09
Eagle Eye Cherry – Save Tonight
00:56:09
Feeling – Love It When You Call
01:04:12
13:02:51
Fine Young Cannibals – Good Thing
Fine Young Cannibals – Good Thing
01:10:14
13:06:54
Blur – There’s No Other Way
Blur – There’s No Other Way
01:14:15
13:09:55
Pet Shop Boys – It’s A Sin
Pet Shop Boys – It’s A Sin
01:17:16
13:14:56
Zutons – Valerie
Zutons – Valerie
01:22:18
13:22:59
Cure – The Love Cats
Cure – The Love Cats
01:26:19
13:27:01
Bryan Adams / Mel C – When You’re Gone
Bryan Adams / Mel C – When You’re Gone
01:30:20
13:30:02
Depeche Mode – Personal Jesus
Depeche Mode – Personal Jesus
01:33:21
13:34:03
Queen – Another One Bites The Dust
Queen – Another One Bites The Dust
01:38:22
13:42:06
Shania Twain – That Don’t Impress Me Much
Shania Twain – That Don’t Impress Me Much
01:42:23
13:45:07
ZZ Top – Gimme All Your Lovin’
ZZ Top – Gimme All Your Lovin’
01:46:25
13:49:09
Abba – Mamma Mia
Abba – Mamma Mia
01:50:26
13:53:10
Survivor – Eye Of The Tiger
Survivor – Eye Of The Tiger
01:53:27
Scouting For Girls – Elvis Aint Dead
01:57:28
Verve – Lucky Man
02:00:29
Fleetwood Mac – Say You Love Me
02:05:30
14:03:13
Kiss – Crazy Crazy Nights
Kiss – Crazy Crazy Nights
02:10:31
14:07:15
Lightning Seeds – Sense
Lightning Seeds – Sense
02:14:33
14:11:16
Pretenders – Brass In Pocket
Pretenders – Brass In Pocket
02:18:34
14:14:17
Elvis Presley / JXL – A Little Less Conversation
Elvis Presley / JXL – A Little Less Conversation
02:21:35
14:22:19
U2 – Angel Of Harlem
U2 – Angel Of Harlem
02:24:36
14:25:20
Trammps – Disco Inferno
Trammps – Disco Inferno
02:28:37
14:29:22
Cast – Guiding Star
Cast – Guiding Star
02:31:38
14:33:23
New Order – Blue Monday
New Order – Blue Monday
02:36:39
14:41:26
Def Leppard – Let’s Get Rocked
Def Leppard – Let’s Get Rocked
02:40:41
14:46:28
Phil Collins – Sussudio
Phil Collins – Sussudio
02:45:42
14:50:30
Shawn Mullins – Lullaby
Shawn Mullins – Lullaby
02:49:43
14:55:31
Stars On 45 – Stars On 45
Stars On 45 – Stars On 45
02:53:45
16:06:35
Dead Or Alive – You Spin Me Round Like A Record
Dead Or Alive – You Spin Me Round Like A Record
03:00:47
16:09:36
Dire Straits – Walk Of Life
Dire Straits – Walk Of Life
03:03:48
16:13:37
Keane – Everybody’s Changing
Keane – Everybody’s Changing
03:07:49
16:17:39
Billy Idol – Rebel Yell
Billy Idol – Rebel Yell
03:10:50
16:25:41
Stealers Wheel – Stuck In The Middle
Stealers Wheel – Stuck In The Middle
03:14:51
16:28:42
Green Day – American Idiot
Green Day – American Idiot
03:18:52
16:33:44
A-Ha – Take On Me
A-Ha – Take On Me
03:21:53
16:36:45
Cranberries – Dreams
Cranberries – Dreams
03:26:54
Elton John – Philadelphia Freedom
03:30:56
Inxs – Disappear
03:36:57
Kim Wilde – You Keep Me Hanging On
03:40:59
16:44:47
Living In A Box – Living In A Box
16:47:48
Status Quo – Rockin’ All Over The World
Status Quo – Rockin’ All Over The World
03:45:00
The similarities between those playlists (which include a 20-songs-in-a-row streak!) surely can’t be coincidence… but they do go some way to explaining why listening to Jack FM
sometimes gives me a feeling of déjà vu (along with, perhaps, the no-talk, all-jukebox format). Looking
elsewhere in the data I found dozens of other similar occurances, though none that were both such long chains and in such close proximity to one another. What does it mean?
There are several possible explanations, including:
The exotic, e.g. they’re using Markov chains to control an auto-DJ, and so just sometimes it randomly chooses to follow a long chain that it “learned” from a real DJ.
The silly, e.g. Jack FM somehow knew that I was monitoring them in this way and are trying to troll me.
My favourite: these two are actually the same playlist, but with breaks interspersed differently. During the daytime, the breaks in the list are more-frequent and longer,
which suggests: ad breaks! Advertisers are far more-likely to pay for spots during the mid-afternoon than they are in the middle of the night (the gap in the overnight playlist could
well be a short ad or a jingle), which would explain why the two are different from one another!
But the question remains: why reuse playlists in close proximity at all? Even when the station operates autonomously, as it clearly does most of the time, it’d surely be easy enough to
set up an auto-DJ using “smart random” (because truly
random shuffles don’t sound random to humans) to get the same or a better effect.
One of the things I love about Jack FM is how little they take seriously. Like their style guide.
Which leads to another interesting observation: Jack FM’s sister stations in Surrey and Hampshire also maintain a similar playlist most of the time… which means that they’re either
synchronising their ad breaks (including their duration – I suspect this is the case) or else using filler jingles to line-up content with the beginnings and ends of songs. It’s a
clever operation, clearly, but it’s not beyond black-box comprehension. More research is clearly needed. (And yes, I’m sure I could just call up and ask – they call me “Newcastle Dan”
on the breakfast show – but that wouldn’t be even half as fun as the data mining is…)
I’ve spent the last couple of weeks digging into some of the newer/fancier/shinier technologies that have been in the limelight of the development world lately – specifically Elixir,
Phoenix and Elm – and while I’ve thoroughly enjoyed them all (and instantly had a bunch of fun ideas for things to build with them), I also realized once more how much I like Ruby,
and what kind of project it’s still a great choice for…
It’s common for a Ruby developer to describe themselves as a Rails developer. It’s also common for someone’s entire Ruby experience to be through Rails. Rails began in 2003 by David
Heinemeier Hansson, quickly becoming the most popular web framework and also serving as an introduction to Ruby for many developers.
Rails is great. I use it every day for my job, it was my introduction to Ruby, and this is the most fun I’ve ever had as a developer. That being said, there are a number of great
framework options out there that aren’t Rails.
This article intends to highlight the differences between Cuba, Sinatra, Padrino, Lotus, and how they compare to or differ from Rails. Let’s have a look at some Non-Rails Frameworks
in Ruby.
What’s the hardest word to guess, when playing hangman? I’ll come back to that.
Whatever could the missing letter be?
Last year, Nick Berry wrote a fantastic blog post about the optimal strategy for Hangman. He showed that the best guesses
to make to get your first “hit” in a game of hangman are not the most-commonly occurring letters in written English, because these aren’t the most commonly-occurring
letters in individual words. He also showed that the first guesses should be adjusted based on the length of the word (the most common letter in 5-letter words is ‘S’, but the most
common letter in 6-letter words is ‘E’). In short: hangman’s a more-complex game than you probably thought it was! I’d like to take his work a step further, and work out which word is
the hardest word: that is – assuming you’re playing an optimal strategy, what word takes the most-guesses?
The rules of hangman used to be a lot more brutal. Nowadays, very few people die as a result of the game.
First, though, we need to understand how hangman is perfectly played. Based on the assumption that the “executioner” player is choosing words randomly, and that no clue is given as to
the nature of the word, we can determine the best possible move for all possible states of the game by using a data structure known as a tree. Suppose our opponent has chosen a
three-letter word, and has drawn three dashes to indicate this. We know from Nick’s article that the best letter to guess is A. And then, if our guess is wrong, the next
best letter to guess is E. But what if our first guess is right? Well, then we’ve got an “A” in one or more positions on the board, and we need to work out the next best
move: it’s unlikely to be “E” – very few three-letter words have both an “A” and an “E” – and of course what letter we should guess next depends entirely on what positions
the letters are in.
There are billions of possible states of game play, but you can narrow them down quickly with strategic guessing.
What we’re actually doing here is a filtering exercise: of all of the possible letters we could choose, we’re considering what possible results that could have. Then for
each of those results, we’re considering what guesses we could make next, and so on. At each stage, we compare all of the possible moves to a dictionary of all possible
words, and filter out all of the words it can’t be: after our first guess in the diagram above, if we guess “A” and the board now shows “_ A _”, then we know that of the
600+ three-letter words in the English language, we’re dealing with one of only about 134. We further refine our guess by playing the odds: of those words, more of them have a “C” in
than any other letter, so that’s our second guess. If it has a C in, that limits the options further, and we can plan the next guess accordingly. If it doesn’t have a C
in, that still provides us with valuable information: we’re now looking for a three-letter word with an A in the second position and no letter C: that cuts it
down to 124 words (and our next guess should be ‘T’). This tree-based mechanism for working out the best moves is comparable to that used by other game-playing computers. Hangman is
simple enough that it can be “solved” by contemporary computers (like draughts –
solved in 2007 – but unlike chess: while modern chess-playing
computers can beat humans, it’s still theoretically possible to build future computers that will beat today’s computers).
Zen Hangman asks the really important questions. If a man has one guess left and refuses to pick a letter, does he live forever, or not at all?
Now that we can simulate the way that a perfect player would play against a truly-random executioner, we can use this to simulate games of hangman for every possible word
(I’m using version 0.7 of this British-English dictionary).
In other words, we set up two computer players: the first chooses a word from the dictionary, the second plays “perfectly” to try to guess the word, and we record how many guesses it
took. So that’s what I did. Here’s the Ruby code I used. It’s heavily-commented and
probably pretty understandable/good learning material, if you’re into that kind of thing. Or if you fancy optimising it, there’s plenty of scope for that too (I knocked it out on a
lunch break; don’t expect too much!). Or you could use it as the basis to make a playable hangman game. Go wild.
The hardest three-letter hangman words. “Sly” is particularly… well, sly.
Running the program, we can see that the hardest three-letter word is “xxv”, which would take 22 guesses (20 of them wrong!) to get. But aside from the roman numeral for 25, I don’t
think that “xxv” is actually a word. Perhaps my dictionary’s not very good. “Oak”, though, is definitely a word, and at 20 guesses (17 wrong), it’s easily enough to hang your opponent
no matter how many strokes it takes to complete the gallows.
Interestingly, “oaks” is an easier word than “oak” (although it’s still very difficult): the addition of an extra letter to a word does not make it harder, especially when that letter
is common.
There are more tougher words in the four-letter set, like the devious “quiz”, “jazz”, “zinc”, and “faux”. Pick one of those and your opponent – unless they’ve seen this blog post! – is
incredibly unlikely to guess it before they’re swinging from a rope.
“Hazing foxes, fucking cockily” is not only the title of a highly-inappropriate animated film, but also a series of very challenging Hangman words.
As we get into the 5, 6, and 7-letter words you’ll begin to notice a pattern: that the hardest words with any given number of letters get easier the longer
they are. That’s kind of what you’d expect, I suppose: if there were a hypothetical word that contained every letter in the alphabet, then nobody would ever fail to (eventually) get it.
Some of the longer words are wonderful, like: dysprosium, semivowel, harrumph, and googolplex.
When we make a graph of each word length, showing which proportion of the words require a given number of “wrong” guesses (by an optimised player), we discover a “sweet spot” window in
which we’ll find all of the words that an optimised player will always fail to guess (assuming that we permit up to 10 incorrect guesses before they’re disqualified). The
window seems small for the number of times I remember seeing people actually lose at hangman, which implies to me that human players consistently play sub-optimally, and do not
adequately counteract that failing by applying an equal level of “smart”, intuitive play (knowing one’s opponent and their vocabulary, looking for hints in the way the game is
presented, etc.).
The “sweet spot” in the bottom right is the set of words which you would expect a perfect player to fail to guess, assuming that they’re given a limit of 10 “wrong” guesses.
In case you’re interested, then, here are the theoretically-hardest words to throw at your hangman opponent. While many of the words there feel like they would quite-rightly be
difficult, others feel like they’d be easier than their ranking would imply: this is probably because they contain unusual numbers of vowels or vowels in unusual-but-telling positions,
which humans (with their habit, inefficient under normal circumstances, of guessing an extended series of vowels to begin with) might be faster to guess than a
computer.
I rediscovered quite how readable the language is when I genuinely ended up writing the following method last week:
# On saving, updates the #Shift counters if the #ExperienceLevel of this
# #Volunteer has been changed
def update_counters_if_experience_level_changed
update_counters if experience_level_changed?
end
For the benefit of those of you who aren’t programmers, I’ll point out that which is obvious to those of us who are: the body of the method (that’s the line that’s indented) is almost
identical to the method name (the line that starts with “def”).
This is the equivalent of going to WikiHow and looking up the article on, say, How to Make a Tie Dyed Cake, only to discover that the text of the article simply
says, “Choose what colours you want, and then make a cake in those colours”… and you understand perfectly and go and make the cake, because you’ve got that
good an understanding. In this metaphor, you’re the Ruby interpreter, by the way. And the cake is delicious.
Okay, I cheated a little: the
experience_level_changed? method was provided for me by the Rails framework. And I had to write the
update_counters method myself (although it, too, contains only one line of code in its body). But the point is still the same: writing Ruby, and thinking in a Rubyish way,
produces beautifully readable, logical code.
I’ve written a program called PicInHTML, which makes web pages with concealed images which are shown when text
on the page is selected. What’s clever about these page are how they work: they’re a single file, with no dependence on images nor Javascript, and they work by leveraging the
little-used ::selected CSS selector. Each individual letter on the page is given a CSS class to associate it with the colour of a corresponding pixel in the source image, and
selecting the text changes the background colour to that pixel colour.
That’s a wordy way of putting it. Let’s try an example:
An example of a special page - selecting the text in this page reveals the Reddit alien. Click on the image to see the discussion about this example on Reddit.
Give it a go on any of the following pages. You’ll need to not be using Microsoft Internet Explorer, I’m afraid, as it doesn’t support the ::selected CSS
selector. All you have to do is select the text on the page to reveal the secret image!
If you’re interested in the mechanics of how it works, or you’d like to get a copy of the source code and have a play yourself, see my project page on PicInHTML. You could also try looking at the source code of any of the pages, above: they’re not too-hard to read, especially for
machine-generated code.
!
An example of a special page - selecting the text in this page reveals the Reddit alien. Click on the image to see the discussion about this example on Reddit.
I’ve been playing with using client-side SSL certificates (installed into your web browser) as a means to authenticate against a Ruby on Rails-powered application. This subject is geeky and of limited interest even to the people who read this blog (with the possible exception of
Ruth, who may find herself doing exactly this as part of her Masters dissertation), so rather than write
about it all here, I’ve written a howto/article: SSL Client Certificate Authentication In Ruby On Rails. If you’re at all interested in the topic, you’re welcome to have a
read and give me any feedback.
Some time ago, I wrote a web-based calendar application in PHP, one of my favourite programming
languages. This tool would produce a HTML tabular calendar for a four week period, Monday to Sunday, in which the current date (or a
user-specified date) fell in the second week (so you’re looking at this week, last week, and two weeks in the future). The user-specified date, for various reasons, would be provided as
the number of seconds since the epoch (1970). In addition, the user must be able to flick forwards and backwards through the calendar, “shifting” by one or four weeks each time.
Part of this algorithm, of course, was responsible for finding the timestamp (seconds since the epoch) of the beginning of “a week last Monday”, GMT. It went something like this (pseudocode):
1. Get a handle on the beginning of "today" with [specified time] modulus [number of seconds in day]
2. Go back in time a week by deducting [number of seconds in day] multiplied by [number of days in week] (you can see I'm a real programmer, because I set "number of days in week"
as a constant, in case it ever gets changed)
3. Find the previous Monday by determining what day of the week this date is on (clever functions in PHP do this for me), then take
[number of seconds in day] multiplied by [number of days after Monday we are] from this to get "a week last Monday"
4. Jump forwards or backwards a number of weeks specified by the user, if necessary. Easy.
5. Of course, this isn't perfect, because this "shift backwards a week and a few days" might have put us in to "last month", in which case the calendar needs to know to deduct one month
and add [number of days in last month]
6. And if we just went "back in time" beyond January, we also need to deduct a year and add 11 months. Joy.
So; not the nicest bit of code in the world.
I’ve recently been learning to program in Ruby On Rails. Ruby is a comparatively young language which
has become quite popular in Japan but has only had reasonable amounts of Westernised documentation for the last four years or so. I started looking into it early this year after reading
an article that compared it to Python. Rails is a web application development framework that sits on top of Ruby and promises to be “quick and
structured”, becoming the “best of both worlds” between web engineering in PHP (quick and sloppy) and in Java (slow and structured). Ruby is a properly object-oriented language – even your literals are objects – and Rails takes full advantage of
this.
For example, here’s my interpretation in Rails of the same bit of code as above:
@week_last_monday is just a variable in which I’m keeping the result of my operation.
7.days might fool you. Yes, what I’m doing there is instantiating an Integer (7, actually a Fixint, but who cares), then calling the “days” function on it, which returns me
an instance of Time which represents 7 days of time.
Calling the ago method on my Time object, which returns me another Time object, this time one which is equal to Time.now (the time right now) minus the amount of
Time I already had (7 days). Basically, I now have a handle on “7 days ago”.
The only thing PHP had up on me here is that it’s gmdate() function had ensured I already had my date/time in
GMT; here, I have to explicitly call gmtime to do the same thing.
And then I simply call monday on my resulting Time object to get a handle on the beginning of the previous Monday. That simple. 24 characters of fun.
+ params[:weeks].to_i.weeks simply increments (or decrements) the Time I have by a number of weeks specified by the user (params[:weeks] gets the number of weeks
specified, to_i converts it to an integer, and weeks, like days, creates a Time object from this. In Ruby, object definitions can even override operators
like +, -, <, >, etc., as if they were methods (because they are), and so the author of the Time class made it simple to perform arithmetic upon times and dates.
This was the very point at which I feel in love with Ruby on Rails.