Separating Words: Decoding a Paper
Gödel’s Lost Letter and P=NP 2019-09-16
A clever trick on combining automata

John Robson has worked on various problems including what is still the best result on separating words—the topic we discussed the other day. Ken first knew him for his proof than checkers is
-complete and similar hardness results for chess and Go.
Today I want to talk about his theorem that any two words can be separated by an automaton with relataivley few states.
In his famous paper from 1989, he proved an upper bound on the Separating Word Problem. This is the question: Given two strings and
, how many states does a deterministic automaton need to be able to accept
and reject
? His theorem is:
Theorem 1 (Robson’s Theorem) Suppose that
and
are distinct strings of length
. Then there is an automaton with at most
states that accepts
and rejects
.
The story of his result is involved. For starters, it is still the best upper bound after almost three decades. Impressive. Another issue is that a web search does not quickly, at least for for me, find a PDF of the original paper. I tried to find it and could not. More recent papers on the separating word problem reference his 1989 paper, but they do not explain how he proves it.
Recall the problem of separating words is: Given two distinct words of length , is there a deterministic finite automaton that accepts one and rejects the other? And the machine has as few states as possible. Thus his theorem shows that roughly the number of states grows at most like the square root of
.
I did finally track the paper down. The trouble for me is the paper is encrypted. Well not exactly, but the version I did find is a poor copy of the original. Here is an example to show what I mean:
</tr

So the task of decoding the proof is a challenge. A challenge, but a rewarding one.
A Cool Trick
Robson’s proof uses two insights. The first is he uses some basic string-ology. That is he uses some basic facts about strings. For example he uses that a non-periodic string cannot overlap itself too much.
He also uses a clever trick on how to simulate two deterministic machines for the price of one. This in general is not possible, and is related to deep questions about automata that we have discussed before here. Robson shows that it can be done in a special but important case.
Let me explain. Suppose that is a string. We can easily design an automaton that accepts
if and only if
is the string
. The machine will have order the length of
states. So far quite simple.
Now suppose that we have a string of length
and wish to find a particular occurrence of the pattern
in
. We assume that there are
occurrences of
in
. The task is to construct an automaton that accepts at the end of the
copy of
. Robson shows that this can be done by a automaton that has order
Here is the length of the string
.
This is a simple, clever, and quite useful observation. Clever indeed. The obvious automaton that can do this would seem to require a cartesian product of two machines. This would imply that it would require
number of states: Note the times operator rather than addition. Thus Robson’s trick is a huge improvement.
Here is how he does this.
His Trick
Robson’s uses a clever trick in his proof of the main lemma. Let’s work through an example with the string . The goal is to see if there is a copy of this string starting at a position that is a multiple of
.
The machine starts in state and tries to find the correct string
as input. If it does, then it reaches the accepting state
. If while doing this it gets a wrong input, then it switches to states that have stopped looking for the input
. After seeing three inputs the machine reaches
and then moves back to the start state.

The Lemmas
We will now outline the proof in some detail.
Hashing
The first lemma is a simple fact about hashing.
Lemma 2 Suppose
and
Then all but
primes satisfy
Proof: Consider the quantity for
not equal to
. Call a prime bad if it divides this quantity. This quantity can be divisible by at most
primes. So there are at most
bad primes in total.
Strings
We need some definitions about strings. Let be the length of the string
. Also let
be the number of occurrences of
in
.
A string has the period
provided
for all so that
is defined. A string
is periodic provided it has a period
that is less than half its length. Note, the shorter the period the more the string is really “periodic”: for example, the string
is more “periodic” than
Lemma 3 For any string
either
or
is not periodic.
Proof: Suppose that is periodic with period
where
is a single character. Let the length of
equal
. So by definition,
. Then
for . So it follows that
This shows that and
cannot both be periodic, since
Lemma 4 Suppose that
is not a periodic string. Then the number of copies of
in a string
is upper bounded by
where
Proof: The claim follows once we prove that no two copies of in
can overlap more than
where
is the length of
. This will immediately imply the lemma.
If has two copies in
that overlap then clearly
for some and all
in the range
. This says that
has the period
. Since
is not periodic it follows that
. This implies that the overlap of the two copies of
are at most length
. Thus we have shown that they cannot overlap too much.
Main Lemma
Say an automaton finds the occurrence of
in
provided it enters a special state after scanning the last bit of this occurrence.
Lemma 5 Let
be a string of length
and let
be a non-periodic string.Then, there is an automaton with at most
states that can find the
occurrence of
in
where
Here allows factors that are fixed powers of
. This lemma is the main insight of Robson and will be proved later.
The Main Theorem
The following is a slightly weaker version of Robson’s theorem. I am still confused a bit about his stronger theorem, to be honest.
Theorem 6 (Robson’s Theorem) Suppose that
and
are distinct strings of length
. Then there is an automaton with at most
states that accepts
and rejects
.
Proof: Since and
are distinct we can assume that
starts with the prefix
and
starts with the prefix
for some string
. If the length of
is less than order
the theorem is trivial. Just construct an automaton that accepts
and rejects
.
So we can assume that for some strings
and
where the latter is order
in length. By lemma we can assume that
is not periodic. So by lemma we get that
Then by lemma we are done.
Proof of Main Lemma
Proof: Let have length
and let
be a non-periodic string in
of length
. Also let
. By the overlap lemma it follows that
is bounded by
.
Let occur at locations
Suppose that we are to construct a machine that finds the copy of
. By the hashing lemma there is a prime
so that
if and only if . Note we can also assume that
.
Let’s argue the special case where is
modulo
. If it is congruent to another value the same argument can be used. This follows by having the machine initially skip a fixed amount of the input and then do the same as in the congruent to
case.
The automaton has states and
for
. The machine starts in state
and tries to get to the accepting state
. The transitions include:
This means that the machine keeps checking the input to see if it is scanning a copy of . If it gets all the way to the accepting state
, then it stops.
Further transitions are:
and
The second group means that if a wrong input happens, then moves to
. Finally, the state
resets and starts the search again by going to the start state
with an epsilon move.
Clearly this has the required number of states and it operates correctly.
Open Problems
The open problem is: Can the SWP be solved with a better bound? The lower bound is still order . So the gap is exponential.