Solving an Image Puzzle from MI5

6 February 2020

The UK security and intelligence services are known for publishing code-breaking challenges as recruiting tools. I recently came across one of these puzzles at the bottom of a page on the MI5 website and thought I’d have a go at solving it.

The Puzzle

Here’s how the puzzle appears on the MI5 website:

CAN YOU SOLVE THIS PUZZLE?
There’s a clue in the image file, if you can find it.

If you want to have a go at solving this yourself, stop here, and good luck! Otherwise, read on…

The Solution

At first I thought maybe the pattern of squares could just be a red herring, and there might be some non-image data appended to the end of the JPEG file — but no such luck.

So I tried decoding the pattern in a Jupyter notebook. I used a few dependencies for dealing with the image data, which are easily installed with conda:

conda install numpy imageio matplotlib

The obvious first step is to read in the image data. imageio is a really nice simple tool for this. It started out as a component of the larger scikit-image project, and essentially acts as a wrapper around Pillow for getting the data into a numpy array. Once we have the data, we can use Matplotlib to display it:

import imageio
from matplotlib import pyplot as plt
img = imageio.imread('SOS.jpg')
plt.imshow(img)

The image appears to comprise a 31 by 31 grid of squares that are 20 by 20 pixels each. Lets extract individual squares and show them for a 10 by 5 subset from the top left:

rows = 5
cols = 10
f, axes = plt.subplots(rows, cols)
for i in range(0, rows):
    for j in range(0, cols):
        ax = axes[i, j]
        y_px = i * 20
        x_px = j * 20
        ax.imshow(img[y_px:y_px + 20, x_px:x_px + 20])
        ax.axis('off')

50 squares from the top left of the image.

Each square appears to be one of four different colours. Due to JPEG compression artifacts, they aren’t perfectly uniform in colour. But let’s get representative RGB values for each one by taking the centre pixel of a randomly chosen square of each colour:

import numpy as np
indigo = img[10, 10]  # Square (0, 0)
violet = img[30, 10]  # Square (0, 1)
grey = img[10, 50]    # Square (2, 0)
white = img[50, 130]  # Square (6, 2)
colours = np.array([indigo, violet, grey, white])

This gives us [42 32 101] for indigo, [84 54 92] for violet, [225 225 227] for grey, and [250 255 252] for white. We can number the colours 0–3 based on their position in the colours array.

Let’s translate the image into a 2D array of those numbers. We do this by iterating each square, getting the average RGB value of its pixels, then choosing the corresponding number for whichever of the four predefined colours are closest.

rows = 31
cols = 31
arr = np.zeros((rows, cols), dtype=int)
for i in range(0, rows):
    for j in range(0, cols):
        x_px = i * 20
        y_px = j * 20
        sq_px = img[y_px:y_px + 20, x_px:x_px + 20]
        sq_colour = np.mean(sq_px, axis=(0, 1))
        closest = np.argmin(np.sum((colours - sq_colour)**2, axis=1))
        arr[j, i] = closest

As a quick sanity check, Let’s look at that same 10 by 5 subset from the top left with arr[:5, :10]:

array([[0, 0, 2, 1, 2, 1, 1, 1, 2, 1],
       [1, 0, 1, 2, 0, 2, 1, 1, 1, 2],
       [1, 0, 1, 0, 1, 0, 3, 0, 1, 0],
       [2, 1, 1, 0, 2, 1, 0, 1, 1, 2],
       [1, 1, 0, 0, 3, 0, 1, 0, 0, 2]])

We can also use Matplotlib to display the data as a heatmap with plt.imshow(arr):

Heatmap of the translated data using the default "viridis" colour map. — Heatmap of the translated data using the default “viridis” colour map.

Good, the pattern matches the original image (even if the colours don’t)!

Now, there are lots of ways this array of data could encode information. Different encoding schemes (e.g. ASCII), different orders through the array (e.g. transposed, reversed, zig-zag), treating the colours individually or in combinations…

Before just randomly trying things, it’s worth thinking about what other clues we have been given. There’s really only one other piece of information available: The filename is SOS.jpg. Of course, this must be Morse code!

But how do these squares of four different colours encode Morse code?

Morse code can be transmitted through any medium with just two states (on and off), so it would technically be possible to represent it with just two colours. However, the essence of Morse code is how a signal changes between those two states over time. In practice, there are two kinds of “on” (short dots and long dashes) and also two kinds of “off” (short gaps between letters and long gaps between words). This means it can be quite naturally (and more efficiently) encoded with four different states, giving us the four different colours in the image.

Scanning horizontally, white and grey squares generally seem to appear in isolation — not as part of longer runs. The main exception is the run of white squares in the bottom right, but that is a strong indicator of blank padding at the end to fill to the end of the row, and actually just makes me more confident that we are supposed to read horizontally.

White squares are the most rare, so they could be the separator between words. Grey squares are more common, so are likely the separator between letters. That leaves indigo and violet as short runs of dashes and dots between the separators.

I’ve cobbled together a Morse code table using Wikipedia:

morse = {
    '.-': 'A', '-...': 'B', '-.-.': 'C', '-..': 'D', '.': 'E', 
    '..-.': 'F', '--.': 'G', '....': 'H', '..': 'I', '.---': 'J', 
    '-.-': 'K', '.-..': 'L', '--': 'M', '-.': 'N', '---': 'O', 
    '.--.': 'P', '--.-': 'Q', '.-.': 'R', '...': 'S', '-': 'T', 
    '..-': 'U', '...-': 'V', '.--': 'W', '-..-': 'X', '-.--': 'Y', 
    '--..': 'Z', '-----': '0', '.----': '1', '..---': '2', 
    '...--': '3', '....-': '4', '.....': '5', '-....': '6', 
    '--...': '7', '---..': '8', '----.': '9', '--..--': ',', 
    '.-.-.-': '.', '..--..': '?', '-.-.-.': ';', '---...': ':', 
    '.----.': "'", '-....-': '-', '-..-.': '/', '-.--.-': '(', 
    '-.--.-': ')', '..--.-': '_', '-.-.--': '!'
}

This maps each series of dots and dashes to the corresponding letter or punctuation.

All that’s left to do is to iterate through the squares, collecting dots and dashes, converting them to the letter each time we hit a grey or white square, and inserting a space each time we hit a white square:

signal = []
letters = []
for row in arr:
    for val in row:
        if val == 0:  # indigo
            signal.append('-')
        elif val == 1:  # violet
            signal.append('.')
        elif val in {2, 3}:  # grey or white
            if signal:
                letters.append(morse[''.join(signal)])
                signal = []
            if val == 3:  # white
                letters.append(' ')
message = ''.join(letters)

By trial and error, it’s easy to determine that indigo is a dash and violet is a dot — you quickly find invalid letters and a nonsense message if you try it the other way around.

Finally, here’s the hidden message:

MESSAGE STARTS… CONGRATULATIONS, YOU CRACKED THE CODE! WE ARE LOOKING FOR PEOPLE LIKE YOU TO JOIN OUR ENGINEERING SECTION. QUOTE THIS IN YOUR APPLICATION: HASHTAGMISSIONACCOMPLISHED18. SEARCH MI5 JOBS FOR THE LATEST SOFTWARE ENGINEERING VACANCIES. IT’S TIME TO OWN THE UNKNOWN. …MESSAGE ENDS

The message makes it sound like “It’s time to own the unknown” is some kind of catchphrase for MI5. At the time of writing, putting it into Google returns just a single result: A page where you can apply for software engineering roles at MI5. But at least one more search result will hopefully show up now that I’ve published this blog post!