2 March 2016

I didn't have something I really wanted to write about today, so I thought I would just mention a conjecture that I have about recurrent neural nets.

Conjecture 1: A recurrent neural network cannot learn the language of palindromic strings.

What do I mean by learn? I originally thought of this in trying to force the neural network to generate arbitrarily long palindromic strings. If we think of the problem of recognizing palindromes instead, then my conjecture is that for sufficiently long strings, many of them will be misclassified. I don't have a precise conjecture on how many strings should be, though.

My (limited) understanding of recurrent neural nets is that some values are propagated to the next iteration, which allows for different behavior based on sequential aspects of the data. However, this also effectively limits its "memory" to some function of its size. Once the first half of the palindrome exceeds that memory limit, the recurrent neural net should be unable to reverse it to produce the second half.

I'd expect it to be possible to show a more general claim:

Suppose that we have a classifier that passes some amount of output from one iteration to the next. Say that the information passed is represented by some metric space \( S \), and each input is in some set \( A \). Then the iteration is defined by a function \( f : S \times A \to S \). Then I expect:

Conjecture 2: If \( S \) is compact, \( f \) is continuous (where \( A \) has the discrete topology), a string is accepted if and only if the final state lands in some open set \( U \subset S \), there are arbitrarily long accepted strings, and every palindrome is accepted, then there are non-palindromic accepted strings.

My idea here is similar to the pumping lemma for regular languages. If you run this iteration on a sufficiently long string, then eventually two intermediate states will be extremely close together. Repeating that section will only change the state by a small amount, and so the system should be unable to distinguish between the original string followed by its reverse (which should be accepted) and the pumped up string followed by the reverse of the original string (which should be rejected).

That's certainly not even close to a full proof, so I'd be interested in finding out if other people have had similar thoughts and found a real barrier here or if there's still potential for ways around it.

If the conjecture is true, then I have a followup question:

Question: What kind of classifier can learn to identify palindromes?