Time-Efficient Shuffling

Some years ago, a friend of mine told me about an interview they’d had for a junior programming position. Their interviewer was one of that particular breed who was attached to programming-test questions: if you’re in the field of computer science, you already know that these questions exist. In any case: my friend was asked to write pseudocode to shuffle a deck of cards: a classic programming problem that pretty much any first-year computer science undergraduate is likely to have considered, if not done.

Interview with paper visible.
Let’s play at writing software. Rather than a computer, we’ll use paper. But to make it sound techy, we’ll call it “pseudocode”.

There are lots of wrong ways to programmatically shuffle a deck of cards, such as the classic “swap the card in each position with the card in a randomly-selected position”, which results in biased results. In fact, the more that you think in terms of how humans shuffle cards, the less-likely you are to come up with a good answer!

Chart showing the probability bias for an incorrectly-implemented Fisher-Yates shuffle, for a 6-card deck.
If we shuffled a deck of six cards with this ‘broken’ algorithm, for example, we’d be more-likely to find the card that was originally in second place at the top of the deck than in any other position. This kind of thing REALLY matters if, for example, you’re running an online casino.

The simplest valid solution is to take a deck of cards and move each card, choosing each at random, into a fresh deck (you can do this as a human, if you like, but it takes a while)… and that’s exactly what my friend suggested.

The interviewer was ready for this answer, though, and asked my friend if they could think of a “more-efficient” way to do the shuffle. And this is where my friend had a brain fart and couldn’t think of one. That’s not a big problem in the real world: so long as you can conceive that there exists a more-efficient shuffle, know what to search for, and can comprehend the explanation you get, then you can still be a perfectly awesome programmer. Demanding that people already know the answer to problems in an interview setting doesn’t actually tell you anything about their qualities as a programmer, only how well they can memorise answers to stock interview questions (this interviewer should have stopped this line of inquiry one question sooner).

Riffle shuffling
Writing a program to shuffle a deck takes longer than just shuffling it, but that’s hardly the point, is it?

The interviewer was probably looking for an explanation of the modern form of the Fisher-Yates shuffle algorithm, which does the same thing as my friend suggested but without needing to start a “separate” deck: here’s a video demonstrating it. When they asked for greater efficiency, the interviewer was probably looking for a more memory-efficient solution. But that’s not what they said, and it’s certainly not the only way to measure efficiency.

When people ask ineffective interview questions, it annoys me a little. When people ask ineffective interview questions and phrase them ambiguously to boot, that’s just makes me want to contrive a deliberately-awkward answer.

So: another way to answer the shuffling efficiency question would be to optimise for time-efficiency. If, like my friend, you get a question about improving the efficiency of a shuffling algorithm and they don’t specify what kind of efficiency (and you’re feeling sarcastic), you’re likely to borrow either of the following algorithms. You won’t find them any computer science textbook!

Complexity/time-efficiency optimised shuffling

  1. Precompute and store an array of all 52! permutations of a deck of cards. I think you can store a permutation in no more than 226 bits, so I calculate that 2.3 quattuordecillion yottabytes would be plenty sufficient to store such an array. That’s about 25 sexdecillion times more data than is believed to exist on the Web, so you’re going to need to upgrade your hard drive.
  2. To shuffle a deck, simply select a random number x such that 0 <= x < 52! and retrieve the deck stored at that location.

This converts the O(n) problem that is Fisher-Yates to an O(1) problem, an entire complexity class of improvement. Sure, you need storage space valued at a few hundred orders of magnitude greater than the world GDP, but if you didn’t specify cost-efficiency, then that’s not what you get.

Stack of shuffled cards
If you’ve got a thousand galaxies worth of free space you can just fill them with actual decks of cards – one for each permutation – and physically pick one at random. That sounds convenient, right?

You’re also going to need a really, really good PRNG to ensure that the 226-bit binary number you generate has sufficient entropy. You could always use a real physical deck of cards to seed it, Solitaire/Pontifex-style, and go full meta, but I worry that doing so might cause this particular simulation of the Universe to implode, sooo… do it at your own risk?

Perhaps we can do one better, if we’re willing to be a little sillier…

(Everett interpretation) Quantum optimised shuffling

Quantum ungulations
If you live in a universe in which quantum optimised shuffling isn’t possible, the technique below can be adapted to create a universe in which it is.

Assuming the many-worlds interpretation of quantum mechanics is applicable to reality, there’s a yet-more-efficient way to shuffle a deck of cards, inspired by the excellent (and hilarious) quantum bogosort algorithm:

  1. Create a superposition of all possible states of a deck of cards. This divides the universe into 52! universes; however, the division has no cost, as it happens constantly anyway.
  2. Collapse the waveform by observing your shuffled deck of cards.

The unneeded universes can be destroyed or retained as you see fit.

Let me know if you manage to implement either of these.

× × × × ×

Supply-Chain Security and Trust

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Can we solve [the problem of supply-chain attacks] by building trustworthy systems out of untrustworthy parts?

It sounds ridiculous on its face, but the Internet itself was a solution to a similar problem: a reliable network built out of unreliable parts. This was the result of decades of research. That research continues today, and it’s how we can have highly resilient distributed systems like Google’s network even though none of the individual components are particularly good. It’s also the philosophy behind much of the cybersecurity industry today: systems watching one another, looking for vulnerabilities and signs of attack.

Security is a lot harder than reliability. We don’t even really know how to build secure systems out of secure parts, let alone out of parts and processes that we can’t trust and that are almost certainly being subverted by governments and criminals around the world. Current security technologies are nowhere near good enough, though, to defend against these increasingly sophisticated attacks. So while this is an important part of the solution, and something we need to focus research on, it’s not going to solve our near-term problems.

Schneier provides a great summary of the state of play with nation-state supply-chain attacks, using the Huawei 5G controversy as a jumping-off point but with reference to the fact that China are far from the only country that weaken the security and privacy of the world’s citizens in order to gain an international spying advantage. He goes on to explain what he sees as the two broad schools of thought are in providing technical solutions to this class of problems, and demonstrates that both are for the time being beyond our reach. The excerpt above comes from his examination of the second school of thought, and it’s a pretty-compelling illustration of why this is a different class of problem that the ones we’ve used to build a reliable Internet.

(Many of the comments are very good, too.)

Dan Q temporarily disabled GC1G4E3 Kinkering Congs Their Titles Take

This checkin to GC1G4E3 Kinkering Congs Their Titles Take reflects a geocaching.com log entry. See more of Dan's cache logs.

Looks like this cache has been muggled, and its hiding place is no longer usable. I’ll look to see if it can be moved somewhere else in the vicinity and the puzzle updated accordingly.

Consume less, create more

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Perhaps three people will read this essay, including my parents. Despite that, I feel an immense sense of accomplishment. I’ve been sitting on buses for years, but I have more to show for my last month of bus rides than the rest of that time combined.

Smartphones, I’ve decided, are not evil. This entire essay was composed on an iPhone. What’s evil is passive consumption, in all its forms.

This amazing essay really hammers home a major part of why I blog at all. Creating things on the Web is good. Creating things at all is good.

A side-effect of social media culture (repost, reshare, subscribe, like) is that it’s found perhaps the minimum-effort activity that humans can do that still fulfils our need to feel like we’ve participated in our society. With one tap we can pass on a meme or a funny photo or an outrageous news story. Or we can give a virtual thumbs-up or a heart on a friend’s holiday snaps, representing the entirety of our social interactions with them. We’re encouraged to create the smallest, lightest content possible: forty words into a Tweet, a picture on Instagram that we took seconds ago and might never look at again, on Facebook… whatever Facebook’s for these days. The “new ‘netiquette” is complicated.

I, for one, think it’d be a better world if it saw a greater diversity of online content. Instead of many millions of followers of each of a million content creators, wouldn’t it be nice to see mere thousands of each of billions? I don’t propose to erode the fame of those who’ve achieved Internet celebrity; but I’d love to migrate towards a culture in which we can all better support one another’s drive to create original content online. And do so ourselves.

The best time to write on your blog is… well, let’s be honest, it was a decade ago. But the second best time is right now. Or if you’d rather draw, or sing, or dance, or make puzzles or games or films… do that. The barrier to being a content creator has never been lower: publishing is basically free and virtually any digital medium is accessible from even the simplest of devices. Go make something, and share it with the world.

(with thanks to Jeremy for the reshare)

Subscribe by Email

For the last few months, I’ve been running an alpha test of an email-based subscription to DanQ.me with a handful of handpicked testers. Now, I’d like to open it up to a slightly larger beta test group. If you’d like to get the latest from this site directly in your inbox, just provide your email address below:

Subscribe by email!

Who’s this for?

Some people prefer to use their email inbox to subscribe to things. If that’s you: great!

What will I receive?

You’ll get a “daily digest”, no more than once per day, summarising everything I’ve published within the last 24 hours. It usually works: occasionally but not often it misses things. You can unsubscribe with one click at any time.

How else can I subscribe?

You can still subscribe in a variety of other ways. Personally, I recommend using a feed reader which lets you choose exactly which kinds of content you’re interested in, but there are plenty of options including Facebook and Twitter (for those of such an inclination).

Didn’t you do this before?

Yes, I ran a “subscribe by email” system back in 2007 but didn’t maintain it. Things might be better this time around. Maybe.

Perpetual Flip Calendar

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

I’m not sure which is the most-hypnotic in this video: the graceful click-clack motion of the finished product or the careful and methodical production steps that precede it. Either way, this perpetual calendar is brilliant, but if I owned it I’d absolutely spend the entire time playing with it rather than using it for its intended purpose.

Retro-PESOSing my Reviews

When I arrived at this weekend’s IndieWebCamp I still wasn’t sure what it was that I would be working on. I’d worked recently to better understand the ecosystem surrounding DanQ.me and had a number of half-formed ideas about tightening it up. But instead, I ended up expanding the reach of my “personal web” considerably by adding reviews as a post type to my site and building tools to retroactively-reintegrate reviews I’d written on other silos.

Amazon customer review
The oldest surviving review I found was my grumbling about Windows XP Home edition being just a crippled version of Pro edition. And now it’s immortalised here.

Over the years, I’ve written reviews of products using Amazon and Steam and of places using Google Maps and TripAdvisor. These are silos and my content there is out of my control and could, for example, be deleted at a moment’s notice. This risk was particularly fresh in my mind as my friend Jen‘s Twitter account was suspended this weekend for allegedly violating the platform’s rules (though Twitter have so far proven unwilling to tell her which rules she’s broken or even when she did so, and she’s been left completely in the dark).

My mission for the weekend was to:

  1. Come up with a mechanism for the (microformat-friendly) display of reviews on this site, and
  2. Reintegrate my reviews from Amazon, Steam, Google Maps and TripAdvisor
Steam review for Hacknet
Steam reviews use a “thumbs up/thumbs down” rating system rather than a “5-star” style, but h-review is capable of expressing both and more.

I opted not to set up an ongoing POSSE nor PESOS process at this point; I’ll do this manually in the short term (I don’t write reviews on third-party sites often). Also out of scope were some other sites on which I’ve found that I’ve posted reviews, for example BoardGameGeek. These can both be tasks for a future date.

DanQ.me ecosystem map showing Dan Q writing reviews and these being re-imported on demand back into DanQ.me.
The lovely diagram I drew earlier this year? Here it is with the new loop drawn on.

I used Google Takeout to export my Google Maps reviews, which comprised the largest number of reviews of the sites I targetted and which is the least screen-scraper friendly. I wrote a bookmarklet-based screen-scraper to get the contents of my reviews on each of the other sites. Meanwhile, I edited by WordPress theme’s functions.php to extended the Post Kinds plugin with an extra type of post, Review, and designed a content template which wrapped reviews in appropriate microformat markup, using metadata attached to each review post to show e.g. a rating, embed a h-product (for products) or h-card (for places). I also leveraged my existing work from last summer’s effort to reintegrate my geo*ing logs to automatically add a map when I review a “place”. Finally, I threw together a quick WordPress plugin to import the data and create a stack of draft posts for proofing and publication.

 

My review of The Rusty Bicycle as it now appears on this site.
I was moderately unimpressed by Oxford pub The Rusty Bicycle. I originally said so on Google Maps, and now I can say so here, too!

So now you can read all of the reviews I’ve ever posted to any of those four sites, right here, alongside any other reviews I subsequently reintegrate and any I write directly to my blog in the future. The battle to own all of my own content after 25 years of scattering it throughout the Internet isn’t always easy, but it remains worthwhile.

(I haven’t open-sourced my work this time because it’s probably useful only to me and my very-specific set-up, but if anybody wants a copy they can get in touch.)

×

HTML Movies

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

1. a (1998)

Documentary about the Buddhist sect responsible for the 1995 Tokyo subway Sarin gas attack.

2. body (2003)

Also known as Jism. No, really. A tale of passion and murder featuring an alcoholic lawyer and the wife of a travelling millionaire.

3. canvas (1992)

Gary Busey stars as an artist who takes part in a heist to save his brother from murderous art thieves.

We could have had so many HTML-themed Troma Nights, if we’d wanted…

Dan Q performed maintenance for GC7QG1Z Oxford’s Wild Wolf Three

This checkin to GC7QG1Z Oxford’s Wild Wolf Three reflects a geocaching.com log entry. See more of Dan's cache logs.

Performed routine maintenance at the cache site; everything seems well.

A couple of ‘cachers have reported that the GZ is inaccessible owing to the path being overgrown. The “obvious” path to the cache really is pretty heavily overgrown and I’ll be increasing the terrain rating from 3 to 3.5 accordingly, but the “obvious” path isn’t the only path! If you need a hint as to the direction from which the alternative path (which is quite a bit longer, but much more-usable) comes, see my GZ video below:

(video also available on QTube)

So You Want To Start An Unpopular Blog

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Very occasionally I get asked how to start blogging by people who would like to create exciting and engaging articles that will build a following by delighting an audience hungry for more. Perhaps they envision spreading their views far across the face of the web.

To which I always reply, “Have you read my blog? I don’t know about any of those things!”

What I do have are 10 years of logs and some vague observations about beginning a blog.

As I’m sat here anyway, helping people get started on the Indieweb, here’s a great (tongue in cheek) look at how you can expect your new blog Indieweb presence to take off and become the Most Popular Thing Ever. Or rather, not.

But as I and others have said before, my blog is first and foremost for me. If you get something out of it too, that’s great, but that’s a secondary goal!

Note #15464

Arrived early for @indiewebcamp #oxford at @oxonlibraries. @garrettc still filling up on coffee.

Makerspace at Oxford Library

×

Cheating Hangman

A long while ago, inspired by Nick Berry‘s analysis of optimal Hangman strategy, I worked it backwards to find the hardest words to guess when playing Hangman. This week, I showed these to my colleague Grace – who turns out to be a fan of word puzzles – and our conversation inspired me to go a little deeper. Is it possible, I thought, for me to make a Hangman game that cheats by changing the word it’s thinking of based on the guesses you make in order to make it as difficult as possible for you to win?

Play “Cheating Hangman”

The principle is this: every time the player picks a letter, but before declaring whether or not it’s found in the word –

  1. Make a list of all possible words that would fit into the boxes from the current game state.
  2. If there are lots of them, still, that’s fine: let the player’s guess go ahead.
  3. But if the player’s managing to narrow down the possibilities, attempt to change the word that they’re trying to guess! The new word must be:
    • Legitimate: it must still be the same length, have correctly-guessed letters in the same places, and contain no letters that have been declared to be incorrect guesses.
    • Harder: after resolving the player’s current guess, the number of possible words must be larger than the number of possible words that would have resulted otherwise.
Gallows on a hill.
Yeah, you’re screwed now.

You might think that this strategy would just involve changing the target word so that you can say “nope” to the player’s current guess. That happens a lot, but it’s not always the case: sometimes, it’ll mean changing to a different word in which the guessed letter also appears. Occasionally, it can even involve changing from a word in which the guessed letter didn’t appear to one in which it does: that is, giving the player a “freebie”. This may seem counterintuitive as a strategy, but it sometimes makes sense: if saying “yeah, there’s an E at the end” increases the number of possible words that it might be compared to saying “no, there are no Es” then this is the right move for a cheating hangman.

Playing against a cheating hangman also lends itself to devising new strategies as a player, too, although I haven’t yet looked deeply into this. But logically, it seems that the optimal strategy against a cheating hangman might involve making guesses that force the hangman to bisect the search space: knowing that they’re always going to adapt towards the largest set of candidate words, a perfect player might be able to make guesses to narrow down the possibilities as fast as possible, early on, only making guesses that they actually expect to be in the word later (before their guess limit runs out!).

Cheating Hangman
The game is brutally-difficult, but surprisingly fun, and you can have it tell you when and how it cheats so you can begin to understand its strategy.

I also find myself wondering how easily I could adapt this into a “helpful hangman”: a game which would always change the word that you’re trying to guess in order to try to make you win. This raises the possibility of a whole new game, “suicide hangman”, in which the player is trying to get themselves killed and so is trying to pick letters that can’t possibly be in the word and the hangman is trying to pick words in which those letters can be found, except where doing so makes it obvious which letters the player must avoid next. Maybe another day.

In the meantime, you’re welcome to go play the game (and let me know what you think, below!) and, if you’re of such an inclination, read the source code. I’ve used some seriously ugly techniques to make this work, including regular expression metaprogramming (using regular expressions to write regular expressions), but the code should broadly make sense if you want to adapt it. Have fun!

Play “Cheating Hangman”

Update 26 September 2019, 16:23: I’ve now added “helpful mode”, where the computer tries to cheat on your behalf rather than against you, but it’s not as helpful as you’d think because it assumes you’re playing optimally and have already memorised the dictionary!

Update 1 October 2019, 06:40: Now featured on MetaFilter; hi, MeFites!

×

This is your phone on feminism

This is a repost promoting content originally published elsewhere. See more things Dan's reposted.

Let’s face the truth. We are in an abusive relationship with our phones.

Ask yourself the first three questions that UK non-profit Women’s Aid suggests to determine if you’re in an abusive relationship:

  • Has your partner tried to keep you from seeing your friends or family?
  • Has your partner prevented you or made it hard for you to continue or start studying, or from going to work?
  • Does your partner constantly check up on you or follow you?

If you substitute ‘phone’ for ‘partner’, you could answer yes to each question. And then you’ll probably blame yourself.

A fresh take by an excellent article. Bringing a feminist viewpoint to our connection to our smartphones helps to expose the fact that our relationship with the devices would easily be classified as abusive were they human. The article goes on to attempt to diffuse the inevitable self-blame that comes from this realisation and move forward to propose a more-utopian future in which our devices might work for us, rather than for the companies that provide the services for which we use them.

Speaking from both (a) experience of abusive relationships and (b) an interest in privacy and security and how that’s undermined by our devices, this piece seems pretty-much spot-on.