An Ancient Conjecture on Primes

Gödel’s Lost Letter and P=NP 2019-12-15

After 50 years we still are baffled by the primes.

Cornell faculty album source

Juris Hartmanis needs no introduction. But we can note this long interview last February by the Heidelberg Laureate Foundation. His sister, Astrid Ivask, was a world–recognized poet.

Today Ken and I wish to discuss one of his old conjectures.

We have previously discussed one of his conjectures here. The conjecture we are interested in his one made in his joint paper with Herbert Shank titled, “On the Recognition of Primes by Automata.”

Shank once wrote a paper On the undecidability of finite planar cubic graphs with Solomon Garfunkel. I, Dick, liked this paper because it honestly pointed out that a predecessor paper by them was erroneous:

In the March, 1971 issue of this Journal (JSL) a paper of ours was published purporting to prove the hereditary undecidability of the first-order theory of finite planar graphs. The proof presented there contains an error which is unfortunately “unfixable” by the methods of that paper. The theorem however is true and we demonstrate here a generalization to finite cubic (exactly three edges at each vertex) planar graphs. The method involves coding the halting problem for a Turing machine into the theory of these graphs by considering special printouts of computations. Let us first consider a discussion of the aforementioned mistake and see what can be learned from it ${\dots}$

The Primality Space Conjecture

Hartmanis and Shank (HS) asked the following conjecture—over fifty years ago—about testing whether an integer is a prime number:

$\displaystyle 01,11,101,111,1101,1011,10001, \dots$

We’ve written them in binary notation with the ones’ place first—just the way to give numbers on the input tape of a Turing machine. No error in their Turing representation here—just their brilliant conjecture connecting space to number theory: Suppose that primality is computed in space ${S(n)}$ . Then

$\displaystyle S(n) \ge cn,$

for some ${c>0}$ . That is, any primality test must take linear space.

As usual, space is the measure of how much storage the algorithm must use. The input tape is read-only and does count against the storage bound. The famous AKS primality test showed that whether a given ${n}$ -bit number ${x}$ is prime can be determined in ${n^{O(1)}}$ time but did not improve the linear upper bound on space, which HS showed by trivial means: Try all possible divisors between ${2}$ and ${\sqrt{x}}$ . The arithmetic and loop for this method can all be conducted in ${O(n)}$ space.

What is some intuition for this conjecture? Why insist the space be linear when ${n^{1-\epsilon}}$ space is plenty—given that polynomial time could still be equal to logspace? Well, suppose ${S(n) \ll n}$ and consider the integers ${x}$ from ${2^{n-1}}$ to ${2^n}$ , that is all having ${n}$ bits in binary notation. The number of machine configurations apart from the content ${x}$ of the input tape is at most:

${2^{O(S(n))}}$ possible contents of the ${S(n)}$ worktape cells, times
${n+1}$ input head positions, times ${S(n)+1}$ or so worktape head positions, times
the finite number of states.

This is ${\ll N = 2^n}$ . In fact, it is ${\ll \frac{N}{n} = \frac{N}{\ln N}}$ which is asymptotically the number of those ${x}$ that are prime according to the Prime Number Theorem. This means that the vast majority of prime numbers have no worktape configuration that is unique to their accepting computation, not even close. Thus each configuration would be shared with many other primes—and most would be shared with many non-primes as well. Whereas many non-primes are composite for the same reason—such as all strings beginning with ${0}$ being even—the intuition is that each prime has its distinct track of reasons that ought not to come together with other primes.

Less Space

The HS paper notes that since the set of primes is not regular, it needs at least order-of ${\log\log(n)}$ space. This is a weak lower bound.

Both the bound and the above intuition become sharper if we require that the input tape be one-way as well as read-only: its head cannot go left to re-read lower-order bits. Every non-regular language requires order-of ${\log(n)}$ space by such 1-way machines.

If we write the prime numbers in unary notation—that is, ${\{0^m : m \text{ is prime}\}}$ —then we can prove they cannot be recognized by a machine ${M}$ in ${o(\log m)}$ space. Having the input tape be unary makes the whole computation depend on the configuration of ${M}$ as described above. In consequence, ${L(M)}$ has infinite arithmetic progressions of accepted strings. But by the celebrated theorem of Gustav Dirichlet, each such progression contains infinitely many primes as well as infinitely-many non-primes.

Scaling back from unary to binary seems to scale the space up to linear, so perhaps this result supports the HS conjecture. HS make related observations about log space and the density of the primes in the paper. On the other hand, polynomial time could still be equal to logspace (with two-way input tape), which would make their conjecture wrong.

Another Prime Conjecture

We believe that the HS conjecture is hard, is related to some interesting questions, and could be false. That is, there might be some way to check that a number is prime in ${o(n)}$ space. Proving lower bounds on space remains, after many decades, hopeless.

Here is a related conjecture. Using the same binary representation as above—where ${1011}$ is 13 and ${10001}$ is 17—Jeffrey Shallit has asked whether we can determine whether a given finite automaton accepts some input so that it is a prime number. We believe it is still open. Let’s assume for the moment that it is true. Then we can solve some open questions in number theory related to primes. This problem unlike the HS is worth some money. For example, the above would pay 50 British pounds—about $66.62.

Jeff points out that if the above is decidable then one can solve the famous problem: Are there any Fermat primes that are not known? Recall a Fermat prime is one of the form

$\displaystyle 2^{2^{\ell}} + 1.$

The largest one known is

$\displaystyle 2^{2^{4}} + 1 = 65,\!537.$

All such primes match the regular expression ${10^*1}$ in binary notation, which is easy for a finite automaton to recognize. This expression matches numbers of the form ${2^k + 1}$ , but as we discussed here, they can be prime only when ${k}$ is a power of ${2}$ .

Open Problems

Why are simple complexity questions about primes so hard? Why indeed?