Digest for May 2018

Summary

This month I completed my research into Oxford’s (former) zoo and I hid a series of geocaches to commemorate it and the wolves that escaped from it once, in the 1930s. I also toured a series of geocaches near Eynsham (listed below) while collecting ‘caching supplies from the military surplus store there.

I also shared (among other things) the story of the creation of a GIF file containing text showing the hex digest of its own MD5 hash, something which might initially appear to be impossible.

All posts

Posts marked by an asterisk (*) are referenced by the summary above.

Articles

Checkins

Reposts

Dan Q couldn’t find GC5X8C6 Offey’s Adventure

This checkin to GC5X8C6 Offey's Adventure reflects a geocaching.com log entry. See more of Dan's cache logs.

Solving the mystery to determine the cache location was the easy (and fun) bit: geocheck confirmed first time! But after half an hour of digging through damp vegetation (and the ocassional nettle) this morning, I had to admit defeat. :-(

GIF MD5 hashquine

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

GIF MD5 hashquine – Rogdham (rogdham.net)

TL;DR: Quick access to GIF MD5 hasquine ressources:

Introduction

A few days ago, Ange Albertini retweteed an tweet from 2013 asking for a document that shows its own MD5 or SHA1 hash.

Later, he named such a document an hashquine, which seems to be appropriate: in computing, a quine is a program that prints its own source code when run.

Now, creating a program that prints its own hash is not that difficult, as several ways can be used to retrieve its source code before computing the hash (the second method does not work for compiled programs):

  • Reading its source or compiled code (e.g. from disk);
  • Using the same technique as in a quine to get the source code.

However, conventional documents such as images are likely not to be Turing-complete, so computing their hash is not possible directly. Instead, it is possible to leverage hash collisions to perform the trick.

This is the method that I used to create the following GIF MD5 hashquine:

hashquine and md5sum

Once I managed to do create it, I figured out that it was not the first GIF MD5 hashquine ever made, since spq beat me to it.

I will take that opportunity to look at how that one was done, and highlight the differences.

Finally, my code is on Github, so if you want to create your own gif md5 hashquine, you could easily start from there!

Creating a GIF MD5 hashquine

To create the hasquine, the two following ressources were used exhaustively:

A note about MD5 collisions

We say that MD5 is obsolete because one of the properties of a cryptographic hash function is that it should not be possible to find two messages with the same hash.

Today, two practical attacks can be performed on MD5:

  1. Given a prefix P, find two messages M1 and M2 such as md5(P || M1) and md5(P || M2) are equal (|| denotes concatenation);
  2. Given two prefixes P1 and P2, find two messages M1 and M2 such as md5(M1 || P1) and md5(M2 || P2) are equal.

To the best of my knowledge, attack 1 needs a few seconds on a regular computer, whereas attack 2 needs a greater deal of ressources (especially, time). We will use attack 1 in the following.

Please also note that we are not able (yet), given a MD5 hash H, to find a message M such as md5(M) is H. So creating a GIF displaying a fixed MD5 hash and then bruteforcing some bytes to append at the end until the MD5 is the one displayed is not possible.

Overview

The GIF file format does not allow to perform arbitrary computations. So we can not ask the software used to display the image to compute the MD5. Instead, we will rely on MD5 collisions.

First, we will create an animated GIF. The first frame is not interesting, since it’s only displaying the background. The second frame will display a 0 at the position of the first character of the hash. The third frame will display a 1 at that same position. And so on and so forth.

In other words, we will have a GIF file that displays all 16 possibles characters for each single character of the MD5 “output”.

If we allow the GIF to loop, it would look like this:

GIF showing all possible MD5 characters

Now, the idea is, for each character, to comment out each frame but the one corresponding to the target hash. Then, if we don’t allow the GIF to loop, it will end displaying the target MD5 hash, which is what we want.

To do so, we will, for each possible character of the MD5 hash, generate a MD5 collision at some place in the GIF. That’s 16×32=512 collisions to be generated, but we average 3.5 seconds per collision on our computer so it should run under 30 minutes.

Once this is done, we will have a valid GIF file. We can compute its hash: it will not change from that point.

Now that we have the hash, for each possible character of the MD5 hash, we will chose one or the other collision “block” previously computed. In one case, the character will be displayed, on the other it will be commented out. Because we replace some part of the GIF file with the specific collision “block” previously computed at that very same place, the MD5 hash of the GIF file will not change.

All what is left to do is to figure out how to insert the collision “blocks” in the GIF file (they look mostly random), so that:

  • It is a valid GIF file;
  • Using one “block” displays the corresponding character at the right position, but using the other “block” will not display it.

I will detail the process for one character.

Example for one character

Let’s look at the part of the generated GIF file responsible for displaying (or not) the character 7 at the first position of the MD5 hash.

The figure below shows the relevant hexdump displaying side by side the two possible choices for the collision block (click to display in full size):

hexdump of two version of a character

The collision “block” is displayed in bold (from 0x1b00 to 0x1b80), with the changing bytes written in red.

In the GIF file formats, comments are defined as followed:

  • They start with the two bytes 21fe (written in white over dark green background);
  • Then, an arbitrary number of sub-blocks are present;
  • The first byte (in black over a dark green background) describes the length of the sub-block data;
  • Then the sub-block data (in black over a light green background);
  • When a sub-block of size 0 is reached, it is the end of the comment.

The other colours in the image above represent other GIF blocks:

  • In purple, the graphics control extension, starting a frame and specifying the duration of the frame;
  • In light blue, the image descriptor, specifying the size and position of the frame;
  • In various shades of red, the image data (just as for comments, it can be composed of sub-blocks).

To create this part of the GIF, I considered the following:

  • The collision “block” should start at a multiple of 64 bytes from the beginning of the file, so I use comments to pad accordingly.
  • The fastcoll software generating a MD5 collision seems to always create two outputs where the bytes in position 123 are different. As a result, I end the comment sub-block just before that position, so that this byte gives the size of the next comment sub-block.
  • For one chosen collision “block” (on the left), the byte in position 123 starts a new comment sub-block that skips over the GIF frame of the character, up to the start of a new comment sub-block which is used as padding to align the next collision “block”.
  • For the other chosen collision “block” (on the right), the byte in position 123 creates a new comment sub-block which is shorter in that case. Following it, I end the comment, add the frame displaying the character of the MD5 hash at the right position, and finally start a new comment up to the comment sub-block used as padding for the next collision “block”.

All things considered, it is not that difficult, but many things must be considered at the same time so it is not easy to explain. I hope that the image above with the various colours helps to understand.

Final thoughts

Once all this has been done, we have a proper GIF displaying its own MD5 hash! It is composed of one frame for the background, plus 32 frames for each character of the MD5 hash.

To speed-up the displaying of the hash, we can add to the process a little bit of bruteforcing so that some characters of the hash will be the one we want.

I fixed 6 characters, which does not add much computations to create the GIF. Feel free to add more if needed.

Of course, the initial image (the background) should have those fixed characters in it. I chose the characters d5 and dead as shown in the image below, so that this speed-up is not obvious!

Background and hash compared

That makes a total of 28 frames. At 20ms per frame, displaying the hash takes a little over half a second.

Analysis of a GIF MD5 hashquine

Since I found out that an other GIF MD5 hashquine has been created before mine once I finished creating one, I thought it may be interesting to compare the two independent creations.

Here is spq’s hashquine:

spq's hashquine

The first noticeable thing is that 7-digits displays have been used. This is an interesting trade-off:

  • On the plus side, this means that only 7×32=224 MD5 collisions are needed (instead of 16×32=512), which should make the generation of the GIF more than twice as fast, and the image size smaller (84Ko versus 152Ko, but I also chose to feature my avatar and some text).
  • However, there is a total of 68 GIF frames instead of 28, so the GIF takes more time to load: 1.34 seconds versus 0.54 seconds.

Now, as you can see when loading the GIF file, a hash of 32 8 characters is first displayed, then each segment needed to be turned off is hidden. This is done by displaying a black square on top. Indeed, if we paint the background white, the final image looks like this:

Using a white background reveals black squares

My guess is that it was easier to do so, because there was no need to handle all 16 possible characters. Instead, only a black square was needed.

Also, the size (in bytes) of the black square (42 bytes) is smaller than my characters (58 to 84 bytes), meaning that it is more likely to fit. Indeed, I needed to consider the case in my code where I don’t have enough space and need to generate an other collision.

Other than that, the method is almost identical: the only difference I noticed is that spq used two sub-block comments or collision alignment and skipping over the collision bytes, whereas I used only one.

For reference, here is an example of a black square skipped over:

hexdump of a commented square

And here is another black square that is displayed in the GIF:

hexdump of a used square

Conclusion

Hashquines are fun! Many thanks to Ange Albertini for the challenge, you made me dive into the GIF file format, which I probably wouldn’t have done otherwise.

And of course, well done to spq for creating the first known GIF MD5 hashquine!

hexdump of two version of a character×

In Defense of Arrested Development Season 4

This article is a repost promoting content originally published elsewhere. See more things Dan's reposted.

They Didn’t Make A Huge Mistake: In Defense of Arrested Development Season 4 (Freshly Popped Culture)

It was overstuffed, scattershot, and occasionally quite tedious — but also kinda brilliant? It’s Arrested Development Season 4.

It was overstuffed, scattershot, and occasionally quite tedious — but also kinda brilliant? It’s Arrested Development Season 4.

Dan Q couldn’t find GC6M48N A Fine Pair # 598 ~ Eynsham

This checkin to GC6M48N A Fine Pair # 598 ~ Eynsham reflects a geocaching.com log entry. See more of Dan's cache logs.

I having the unluckiest streak today! Got the coordinates no problem but when I reached the GZ despite a thorough search (and no stone left unturned) I couldn’t get lucky. Perhaps I’ll come back and bring the geokid: it’s the kind of hunt she excels at.

Dan Q reported GC5VKVN Walk by the Firehouse #4 needs maintenance

This checkin to GC5VKVN Walk by the Firehouse #4 reflects a geocaching.com log entry. See more of Dan's cache logs.

Strongly suspect this cache is missing. CO: message me if you want me to confirm where I looked and what I found that makes me think it’s definitely a goner (if that’s easier than visiting the GZ). Sorry!

Dan Q couldn’t find GC5VKVN Walk by the Firehouse #4

This checkin to GC5VKVN Walk by the Firehouse #4 reflects a geocaching.com log entry. See more of Dan's cache logs.

The coordinates seemed “out” from what I was expecting, and I think my hunch was right, especially after reading the old logs (I think I’ve seen a cache like this before). I’m pretty sure this cache is gone: I even found what I believe to be a piece of it; photo attached. :-(

Nano cache magnet, detached, in Dan's hand.

Nano cache magnet, detached, in Dan's hand.×

Dan Q couldn’t find GC53422 Walk by the Firehouse #2

This checkin to GC53422 Walk by the Firehouse #2 reflects a geocaching.com log entry. See more of Dan's cache logs.

Feels like I kicking off an unlucky streak today with my second DNF in a row. Had to give up after a moderately thorough search in all the obvious places (and a few less-obvious ones). Maybe another time.

Dan Q couldn’t find GC531M9 Walk by the Firehouse #1

This checkin to GC531M9 Walk by the Firehouse #1 reflects a geocaching.com log entry. See more of Dan's cache logs.

Coming here during the daytime may have been mistake: I wasn’t able to search effectively while people from the industrial estate kept coming down here for their cigarette breaks, and a forklift driver was watching me suspiciously. Gave up after about 10 unproductive minutes.

Dan Q found GLVZ4C8V Walk by the Firehouse #3

This checkin to GLVZ4C8V Walk by the Firehouse #3 reflects a geocaching.com log entry. See more of Dan's cache logs.

Found without difficulty after a short search. Nice to see something a bit different than the usual placement. Container possibly missing a part: may need attention before the next severe weather but probably okay for now. TFTC.

Dan Q found GLVZ47BH Circular Walk

This checkin to GLVZ47BH Circular Walk reflects a geocaching.com log entry. See more of Dan's cache logs.

While running an errand in Eynsham I thought I’d take the opportunity to explore some of the local geocaches. This one was the first, and I was glad of the hint (and also glad that I was dressed appropriately for stomping down nettles – quite the crop of them this spring!) or else I’d have been stuffed! TFTC.