12 July 2020

About four years ago I wrote a post about the Devil's Chessboard problem and then a followup post with some more thoughts. The problem came to my attention again recently because of a video by 3Blue1Brown where he goes through the result I had mentioned before that a solution can only exist for board sizes that are a power of two.

At the end of that video, he mentioned a connection between the chessboard puzzle and error correcting codes. I had been vaguely aware of such a connection before, but hadn't given it a whole lot of thought. With some rekindled interest, I decided to dive deeper into the connection and found some more insight, including a definitive answer to my earlier question of whether there are other solutions (but of course the answer just raises a new question).

Since it's been a while, here is the problem statement again.

The devil has captured two people and is playing a game with them for their freedom. Person A will be presented with a chessboard with a penny in each square (64 total), with each penny either heads up or tails up randomly. The devil will choose a particular square and point it out to Person A. Person A then chooses a single square, and flips the penny in that square. Afterward, Person A is sent away and Person B is brought forward. Based on the new state of the board, Person B must point out the same square that the devil did in order to win. The two people can devise a strategy beforehand, but cannot communicate once the game starts. How can they win?

The Chessboard and Single Error Correcting Codes

Imagine a situation where you want to transmit a sequence of bits to someone else, say 0111001. However, in the process, one of the bits you transmit might get flipped from a 0 to a 1 or vice versa. "Ok," you say, "I'll send each bit twice!". So you send 00111111000011.

Sadly, that isn't quite good enough, as when your friend receives the string 10111111000011, they don't know whether the first bit in your message was 1 and you sent a message starting with 11, or if the first bit was 0 and you sent a message starting with 00. Duplicating the bits is enough to detect a single bit error, but not enough to correct it.

"Ah, but what about three times?" So you instead send 000111111111000000111. Now your friend can look at each group of 3 and if only a single bit flip happens, then the majority will always point to the correct source. Triplicating each of the bits is a valid, but woefully inefficient, single error correcting code.

Before we go deeper, there is another viewpoint of these codes that will be important. Given an encoding scheme, consider the set of valid messages, which we'll also call codewords. That is, consider for each message you want to convey, what the sequence of bits you'll transmit is. For the case of the simple code above, that's any sequence of bits where each group of 3 bits is the same.

Then say that the distance between two bitstrings is the number of bit flips that need to occur to turn one into another, i.e. the number of positions where they differ. A code is single error correcting exactly any two valid messages have a distance of at least 3. The reason for that is fairly simple: Your transmission starts at a valid message and one bit flip only takes it a distance 1 away. There can't be another valid message distance 1 from the new transmission, since then that message and your original would be only 2 apart.

Conversely, if there are two messages that are only distance 2, then the code can't possibly be single error correcting, since there is a transmission that is distance 1 from each of them (flip one of the two bits where they differ), and there is no way to know which message the transmission came from.

With that in mind, let's look at the connection with the chessboard problem. Any solution to the chessboard problem consists of coloring the vertices of an \( n \)-dimensional hypercube with \( n \) colors such that each vertex has exactly one neighbor of each color.

As discussed in the first post, the even parity vertices and the odd parity vertices of the hypercube can be colored entirely independently, so let's focus on the even parity vertices only.

Focus on a particular color, and look at all the even parity vertices that are that color. None two of them can have a distance of only 2, as then they would share a common neighbor, so that neighbor would not be adjacent to each color. Since the distance between even parity vertices will always be even, the distance between two of the same color has to be at least 4.

We can biject these even vertices, which we'll identify with bitstrings with an even number of ones, with arbitrary bitstrings with one less bit, simply by deleting the last bit. For example, 01101111 would get mapped to 0110111. To reverse this bijection, we append a 0 if there are already an even number of 1s, and append a 1 otherwise. So since 0110111 has 5 ones, an odd number, we append a 1 and get back to 01101111.

When viewed through this bijection, the even parity vertices of a particular color give us a set of length \( n-1 \) bitstrings that all differ in at least 3 locations: the original bitstrings differ in at least 4, but if they differed in the last bit then we deleted that one. In other words, this new set of bitstrings is a single error correcting code for \( n - 1 \) bits.

Furthermore, we know that our coloring of the even vertices must contain each color exactly \( \frac{2^{n-1}}{n} \) times, and so we'll end up with that many codewords as well. Notice that in the error correcting code context, a message might stay the same or turn into one of its \( n - 1 \) neighbors, so its "neighborhood" is size exactly \( n \). Since all the neighborhoods are disjoint, their total size is \( 2^{n-1} \), making this a perfect error correcting code, meaning every possible bitstring is either a codeword or distance 1 from a codeword.

Going the other way

So from any solution to the chessboard problem, we can get a (actually \( n \)) error correcting code. And it turns out that we can start with an error correcting code, then turn it into a solution. This fact was brought to my attention by Pau Cantos Coll who wrote about the problem as an IB Extended Essay. I will reproduce the argument here with some small tweaks to fit with the viewpoints that I've been using.

Suppose we have a perfect error correcting code on \( n - 1 \) bits. Call the set of codewords \( C \subset \{0, 1\}^{n-1} \). Let \( p: \{0, 1\}^n \to \{0, 1\}^{n-1} \) be the map that deletes the last bit. This is a bijection from the even parity vertices to \( \{0, 1\}^{n-1} \). We'll define our coloring as follows:

For an even vertex \( x \in \{0, 1\}^n \), if \( p(x) \in C \), then we will use color \( n \). Otherwise, there is exactly one index \( i \in \{0, 1, \ldots, n-1 \} \) such that \( p(x) \oplus e_i \in C \), where \( e_i \) is the bitstring where the \( i \)th bit is 1 and the rest are 0. The existence and uniqueness of \( i \) follows from the perfect error correcting nature of \( C \). In that case, color \( x \) with color \( i \).

Now to check that this coloring is a valid solution, it suffices to check that there are no two vertices of the same color with distance 2. Consider two vertices \( x \) and \( y \) that are both color \( i \). Then we have \( p(x) \oplus e_i \in C \) and \( p(y) \oplus e_i \in C \). Since \( p(x) \neq p(y) \), and C is a single error correcting code, \( d(p(x), p(y)) = d(p(x) \oplus e_i, p(y) \oplus e_i) \geq 3 \). But on the other hand, \( d(p(x), p(y)) \leq d(x, y) \), so \( d(x, y) \geq 3 \), which completes the proof. We'll call this generated chessboard strategy the "coset strategy" for the chosen code.

So we can construct a chessboard solution from a perfect single error correcting code, and a perfect single error correcting code from a chessboard solution. Nice! But the story continues...

The Standard Solution and the Hamming Code

The most well-known perfect error correcting code other than the trivial one is the Hamming Code, which adds \( r \) parity bits to a length \( 2^r - r - 1 \) message to create codewords with \( 2^r - 1 \) bits. For our purposes, we only care about which bitstrings are codewords.

As an illustrative example, let's let \( r = 3 \) so there are 7 bits \( b_1, b_2, b_3, b_4, b_5, b_6, b_7 \). Then this bitstring is a codeword if all of the following are true:

  • \( b_1 \oplus b_3 \oplus b_5 \oplus b_7 = 0 \)
  • \( b_2 \oplus b_3 \oplus b_6 \oplus b_7 = 0 \)
  • \( b_4 \oplus b_5 \oplus b_6 \oplus b_7 = 0 \)

The first condition consists of indices where the lowest bit of the index is 1. The second condition consists of indices where the second bit of the index is 1. Finally, the third condition consists of indices where the highest bit of the index is 1.

To correct an error, we compute the left hand sides of those three conditions for our transmission. If the result is all 0s, then we have a valid code word. Otherwise, treating the three results as a binary number tells us which bit to flip to simultaneously satisfy all three.

As an example, consider the word 1110010.

  • \( b_1 \oplus b_3 \oplus b_5 \oplus b_7 = 1 \oplus 1 \oplus 0 \oplus 0 = 0 \)
  • \( b_2 \oplus b_3 \oplus b_6 \oplus b_7 = 1 \oplus 1 \oplus 1 \oplus 0 = 1 \)
  • \( b_4 \oplus b_5 \oplus b_6 \oplus b_7 = 0 \oplus 0 \oplus 1 \oplus 0 = 1 \)

To get to a valid codeword, we'll want to flip the bit with index 110, also known as 6. The result is 1110000, and you can check that all three equations are satisfied for that word.

For comparison, here is the standard solution to the chessboard problem, rephrased in our coloring terminology: The color of a bitstring is the xor of the indices with a 1 bit.

Squinting a bit, we can see that this is actually the same as deleting the 0th bit and then coloring the bitstring with the index that needs to be changed to result in a valid Hamming codeword (or 0 if it already is). In other words, the standard solution to the chessboard problem is the coset strategy for the Hamming code!

Nonlinear Perfect Codes

It turns out that while perfect single error correcting codes need to have the same bit length and number of codewords as a Hamming code, there are more to be found. One such construction was found by Vasil'ev in 1962, which I will reproduce here based on a writeup by Frank Kschischang.

Let \( C \subset \{ 0, 1 \}^{n-1} \) be a single error correcting code and \( f: C \to \{0, 1\} \) be an arbitrary function. Then we construct \( C' \subset \{ 0, 1 \}^{2n-1} \) as follows, noting that \( 2n-1 = 2(n-1) + 1 \):

\( C' = \{(u, u \oplus v, \pi(u) \oplus f(v))\ |\ u \in \{0, 1\}^{n-1}, v \in C\} \) where \( \pi(u) \) is the parity of \( u \) (0 if \( u \) has an even number of 1s, and 1 if it has an odd number).

We can rephrase this slightly in a way that might be more clear for some people: \( C' \) consists of the bitstrings \( (u, v, z) \) where \(u \oplus v \in C \) and \( z = \pi(u) \oplus f(u \oplus v) \).

Let's see why \( C' \) is single error correcting. Starting from any codeword in \( C' \), it is clear that any codeword where \( u \oplus v \) is a different member of \( C \) is distance at least 3, as we would need to change \( u \oplus v \) by at least 3 bits. If \(u \oplus v \) stays the same, then if \( u \) changes by 2 bits, v must have also changed by 2 bits, hence a distance at least 4. It therefore suffices to check the case where \( u \) and \( v \) each changed by one bit. In this case, though, the parity of \( u \) changed, so \( z \) must have changed as well, leading to a distance of 3. Furthermore, \( C' \) is a perfect single error correcting code because it has \( 2^{n-1}|C| \) elements, so the total size of the neighborhoods is \( 2n \cdot 2^{n-1}|C| = 2^n\cdot n|C| = 2^n \cdot 2^{n-1} = 2^{2n-1} \).

If we restrict our attention to codes where the all zero string is a valid codeword (this isn't very restrictive, since codes can be translated by xoring each codeword by the same string), the Hamming code is linear, meaning that the set of codewords is a linear subspace of \( \mathbb{Z}_2^{n} \). On the other hand, the Vasil'ev construction can construct nonlinear codes, even if the smaller error correcting code \( C \) was linear.

It isn't too difficult to see the non-linearity. Suppose that we have two codewords \( (u, v, z) \) and \( (u', v', z') \). Their xor is \( (u \oplus u', v \oplus v', z \oplus z') \), which is a codeword if \( z \oplus z' = \pi(u \oplus u') \oplus f(u \oplus u' \oplus v \oplus v') \). Since \( z \) and \( z' \) came from valid codewords, this is equivalent to \( \pi(u) \oplus f(u \oplus v) \oplus \pi(u') \oplus f(u' \oplus v') = \pi(u \oplus u') \oplus f(u \oplus u' \oplus v \oplus v') \). Since \( \pi \) is linear, those terms cancel and we're left with \( f(u \oplus v) \oplus f(u' \oplus v') = f(u \oplus v \oplus u' \oplus v') \). In other words, we would need \( f(c) \oplus f(c') = f(c \oplus c') \) for all \( c, c' \in C \). Simply picking a non-linear function for \( f \) therefore forces the resulting code to be non-linear, and if \( f(0) = 0 \), then the all zero string will be a valid codeword.

So that resolves the question I had before entirely. All of the "fudgings" of the standard solution result in a linear code for the color containing the zero vertex. On the other hand, the coset strategy for a nonlinear code will produce that nonlinear code again, and therefore cannot be a fudging of the standard solution.

Interestingly, the Vasil'ev construction doesn't create a nonlinear code for \( n = 7 \). This is because the only perfect single error correcting code for \( n = 3 \) is the trivial code with codewords 000 and 111. However, with \( f(000) = 0 \), regardless of which value we choose for \( f(111) \), the resulting function will be linear.

Now we ask a new question: Is every strategy the coset strategy for some error correcting code (or equivalent under permuting the colors)? This time I have an answer, and it turns out to be no.

Vasil'ev on the Chessboard

Here is a rough intuitive description of the Vasil'ev construction. Cut the bitstring in half (except the last bit). Use the xor of those two halves to find the index, but not which half, the error is in. Then use the parity of the first half, masked by an arbitrary function of the smaller codeword and the final bit, to determine which half the error was in.

Let's try that on the chessboard. Cut the chessboard in half, and say their states before the coin flip are \( u \) and \( v \). The solution for the smaller chessboard size tells us which index in \( u \oplus v \) needs to be flipped in order to send the message of the pair of squares that includes our goal. Call that index \( i \). Then we can either flip \( u_i \) or \( v_i \), and to choose, we'll use an arbitrary function \( f: \{0, 1\}^{n/2} \to \{0, 1\} \). We'll say that the target square is the one in the first chessboard if \( \pi(u) \oplus f(u \oplus v) \) is 0, and in the second chessboard if it's 1. By choosing to flip the bit in \( u \) or in \( v \) appropriately, we can control \( \pi(u) \), and therefore control \( \pi(u) \oplus f(u \oplus v) \) as we need to, regardless of how \( f \) is defined.

The interesting thing about this strategy in light of what we've discussed so far is that we have much more flexibility than in the Vasil'ev construction. In the Vasil'ev construction we only got to pick function values for the codewords of a perfect error correcting code of \( n-1 \) bits, namely \( \frac{ 2^{n-1} }{n} \) values. Here, however, we get to pick any function on all of the possible bitstrings, a total of \( 2^{n-1} \) arbitrary values.

And in fact, where the Vasil'ev construction doesn't give any non-linear codes for 7 bits, this chessboard analogue does give new solutions for the size 8 chessboard. It is difficult to illustrate the solution concretely, so I'll finish this post with a bit of Python code and its output. The new strategy we compute matches the coset strategy on all of the bitstrings of the form \( e_0 \oplus e_i \) (the 0th bit is at the end in this case), so it is definitely not a coset strategy even if we permute the colors.

What that means is that even though there is a definite connection between the Devil's Chessboard and single error correcting codes, solutions to the chessboard problem actually have more degrees of freedom than error correcting codes do. While now we can see that there are many more solutions than previously expected, it is still unclear whether there are more, and if so, how many are still to be found.

# Normal linear strategy for 4 bits
def strategy_4(x):
    ret = 0
    for i in range(4):
        if ((x >> i) & 1) == 1:
            ret ^= i
    return ret

# An arbitrary function {0, 1}^4 -> {0, 1}
# Here we choose 0 when there are 0-2 1s and 1 if there are 3 or 4.
def arbitrary_function(x):
    popcount = 0
    for i in range(4):
        if ((x >> i) & 1) == 1:
            popcount += 1
    if popcount >= 3:
        return 1
    else:
        return 0

def parity(x):
    ret = 0
    while x > 0:
        if (x & 1) == 1:
            ret ^= 1
        x >>= 1
    return ret

def strategy_8(x):
    u = x >> 4
    v = x & 0xF
    return strategy_4(u ^ v) + \
        (0 if ((parity(u) ^ arbitrary_function(u ^ v)) == 0) else 4)

def verify_8(strat):
    for x in range(256):
        neighbor_values = [strat(x ^ (1 << i)) for i in range(8)]
        if sorted(neighbor_values) != list(range(8)):
            return False
    return True

def coset_strategy_8(zero_set):
    def strat(x):
        for i in range(8):
            if x ^ (1 << i) ^ 1 in zero_set:
                return i
    return strat

zero_set = []
for x in range(256):
    if strategy_8(x) == 0:
        zero_set.append(x)

coset_strat = coset_strategy_8(zero_set)

print("Bits    \tNew\tCoset")
for x in range(256):
    if parity(x) == 0:
        print("{:08b}\t{:d}\t{:d}".format(x, strategy_8(x), coset_strat(x)))

if verify_8(coset_strat):
    print("Verified coset strategy!")

if verify_8(strategy_8):
    print("Verified new strategy!")
Bits        New Coset
00000000    0   0
00000011    1   1
00000101    2   2
00000110    3   7
00001001    3   3
00001010    2   6
00001100    1   5
00001111    4   4
00010001    4   4
00010010    5   5
00010100    6   6
00010111    7   3
00011000    7   7
00011011    6   2
00011101    5   1
00011110    0   0
00100001    5   5
00100010    4   4
00100100    7   3
00100111    6   6
00101000    6   2
00101011    7   7
00101101    0   0
00101110    5   1
00110000    1   1
00110011    0   0
00110101    3   7
00110110    2   2
00111001    2   6
00111010    3   3
00111100    4   4
00111111    1   5
01000001    6   6
01000010    7   3
01000100    4   4
01000111    5   5
01001000    5   1
01001011    0   0
01001101    7   7
01001110    6   2
01010000    2   2
01010011    3   7
01010101    0   0
01010110    1   1
01011001    1   5
01011010    4   4
01011100    3   3
01011111    2   6
01100000    3   7
01100011    2   2
01100101    1   1
01100110    0   0
01101001    4   4
01101010    1   5
01101100    2   6
01101111    3   3
01110001    7   3
01110010    6   6
01110100    5   5
01110111    4   4
01111000    0   0
01111011    5   1
01111101    6   2
01111110    7   7
10000001    7   7
10000010    6   2
10000100    5   1
10000111    0   0
10001000    4   4
10001011    5   5
10001101    6   6
10001110    7   3
10010000    3   3
10010011    2   6
10010101    1   5
10010110    4   4
10011001    0   0
10011010    1   1
10011100    2   2
10011111    3   7
10100000    2   6
10100011    3   3
10100101    4   4
10100110    1   5
10101001    1   1
10101010    0   0
10101100    3   7
10101111    2   2
10110001    6   2
10110010    7   7
10110100    0   0
10110111    5   1
10111000    5   5
10111011    4   4
10111101    7   3
10111110    6   6
11000000    1   5
11000011    4   4
11000101    3   3
11000110    2   6
11001001    2   2
11001010    3   7
11001100    0   0
11001111    1   1
11010001    5   1
11010010    0   0
11010100    7   7
11010111    6   2
11011000    6   6
11011011    7   3
11011101    4   4
11011110    5   5
11100001    0   0
11100010    5   1
11100100    6   2
11100111    7   7
11101000    7   3
11101011    6   6
11101101    5   5
11101110    4   4
11110000    4   4
11110011    1   5
11110101    2   6
11110110    3   3
11111001    3   7
11111010    2   2
11111100    1   1
11111111    0   0
Verified coset strategy!
Verified new strategy!