Joni Teravainen and I have uploaded to the arXiv my paper “Quantitative correlations and some problems on prime factors of consecutive integers“. This paper applies modern analytic number theory tools – most notably, the Maynard sieve and the recent correlation estimates for bounded multiplicative functions of Pilatte – to resolve (either partially or fully) some old problems of Erdős, Strauss, Pomerance, Sárközy, and Hildebrand, mostly regarding the prime counting function

and its relatives. The famous
Hardy–Ramanujan and
Erdős–Kac laws tell us that asymptotically for

,

should behave like a gaussian random variable with mean and variance both close to

; but the question of the
joint distribution of consecutive values such as

is still only partially understood. Aside from some lower order correlations at small primes (arising from such observations as the fact that precisely one of

will be divisible by

), the expectation is that such consecutive values behave like independent random variables. As an indication of the state of the art, it was recently
shown by Charamaras and Richter that any bounded observables

,

will be asymptotically decorrelated in the limit

if one performs a logarithmic statistical averaging. Roughly speaking, this confirms the independence heuristic at the scale

of the standard deviation, but does not resolve finer-grained information, such as precisely estimating the probability of the event

.
Our first result, answering a question of Erdős, shows that there are infinitely many
for which one has the bound

for all

. For

, such a bound is already to be expected (though not completely universal) from the Hardy–Ramanujan law; the main difficulty is thus with the short shifts

. If one only had to demonstrate this type of bound for a bounded number of

, then this type of result is well within standard sieve theory methods, which can make any bounded number of shifts

“almost prime” in the sense that

becomes bounded. Thus the problem is that the “sieve dimension”

grows (slowly) with

. When writing about this problem in 1980, Erdős and Graham write “we just know too little about sieves to be able to handle such a question (“we” here means not just us but the collective wisdom (?) of our poor struggling human race)”.
However, with the advent of the Maynard sieve (also sometimes referred to as the Maynard–Tao sieve), it turns out to be possible to sieve for the conditions
for all
simultaneously (roughly speaking, by sieving out any
for which
is divisible by a prime
for a large
), and then performing a moment calculation analogous to the standard proof (due to Turán) of the Hardy–Ramanujan law, but weighted by the Maynard sieve. (In order to get good enough convergence, one needs to control fourth moments as well as second moments, but these are standard, if somewhat tedious, calculations).
Our second result, which answers a separate question of Erdős, establishes that the quantity

is irrational; this had recently been
established by Platt under the assumption of the prime tuples conjecture, but we are able to establish this result unconditoinally. The binary expansion of this number is of course closely related to the distribution of

, but in view of the Hardy–Ramanujan law, the

digit of this number is influenced by about

nearby values of

, which is too many correlations for current technology to handle. However, it is possible to do some “Gowers norm” type calculations to decouple things to the point where pairwise correlation information is sufficient. To see this, suppose for contradiction that this number was a rational

, thus

Multiplying by

, we obtain some relations between shifts

:

Using the additive nature of

, one then also gets similar relations on arithmetic progressions, for many

and

:

Taking alternating sums of this sort of identity for various

and

(in analogy to how averages involving arithmetic progressions can be related to Gowers norm-type expressions over cubes), one can eventually arrive eliminate the contribution of small

, and arrive at an identity of the form

for many

, where

is a parameter (we eventually take

) and

are various shifts that we will not write out explicitly here. This looks like quite a messy expression; however, one can adapt proofs of the Erdős–Kac law and show that, as long as one ignores the contribution of really large prime factors (of order

, say) to the

, that this sort of sum behaves like a gaussian, and in particular once one can show a suitable local limit theorem, one can contradict
(1). The contribution of the large prime factors does cause a problem though, as a naive application of the triangle inequality bounds this contribution by

, which is an error that overwhelms the information provided by
(1). To resolve this we have to adapt the pairwise correlation estimates of Pilatte mentioned earlier to demonstrate that the these contributions are in fact

. Here it is important that the error estimates of Pilatte are quite strong (of order

); previous correlation estimates of this type (such as those used in
this earlier paper with Joni) turn out to be too weak for this argument to close.
Our final result concerns the asymptotic behavior of the density

(we also address similar questions for

and

). Heuristic arguments led
Erdős, Pomerance, and Sárközy to conjecture that this quantity was asymptotically

. They were able to establish an upper bound of

, while Hildebrand obtained a lower bound of

, due to
Hildebrand. Here, we obtain the asymptotic for almost all

(the limitation here is the standard one, which is that the current technology on pairwise correlation estimates either requires logarithmic averaging, or is restricted to almost all scales rather than all scales). Roughly speaking, the idea is to use the circle method to rewrite the above density in terms of expressions

for various frequencies

, use the estimates of Pilatte to handle the minor arc

, and convert the major arc contribution back into physical space (in which

and

are now permitted to differ by a large amount) and use more traditional sieve theoretic methods to estimate the result.