A note on the mean value of the Hooley delta function

What's new 2023-06-17

Dimitris Koukoulopolous and I have just uploaded to the arXiv our paper “A note on the mean value of the Hooley delta function“. This paper concerns a (still somewhat poorly understood) basic arithmetic function in multiplicative number theory, namely the Hooley delta function

$\displaystyle \Delta(n) := \sup_u \Delta(nlu)$

where

$\displaystyle \Delta(n;u) := \# \{ d|n: e^u < d \leq e^{u+1} \}.$

The function ${\Delta}$ measures the extent to which the divisors of a natural number can be concentrated in a dyadic (or more precisely, ${e}$ -dyadic) interval ${(e^u, e^{u+1}]}$ . From the pigeonhole principle, we have the bounds

$\displaystyle \frac{\tau(n)}{\log n} \ll \Delta(n) \leq \tau(n),$

where ${\tau(n) := \# \{ d: d|n\}}$ is the usual divisor function. The statistical behavior of the divisor function is well understood; for instance, if ${n}$ is drawn at random from ${1}$ to ${x}$ , then the mean value of ${\tau(n)}$ is roughly ${\log x}$ , the median is roughly ${\log^{\log 2} x}$ , and (by the Erdos-Kac theorem ${\tau(n)}$ asymptotically has a log-normal distribution). In particular, there are a small proportion of highly divisible numbers that skew the mean to be significantly higher than the median.

On the other hand, the statistical behavior of the Hooley delta function is significantly less well understood, even conjecturally. Again drawing ${n}$ at random from ${1}$ to ${x}$ for large ${x}$ , the median is known to be somewhere between ${(\log\log x)^{0.3533\dots}}$ and ${(\log\log x)^{0.6102\dots}}$ for large ${x}$ – a (difficult) recent result of Ford, Green, and Koukoulopolous (for the lower bound) and de la Bréteche and Tenenbaum (for the upper bound). And the mean ${\frac{1}{x} \sum_{n \leq x} \Delta(n)}$ was even less well controlled; the best previous bounds were

$\displaystyle \log \log x \ll \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll \exp( c \sqrt{\log\log x} )$

for any ${c > \sqrt{2} \log 2}$ , with the lower bound due to Hall and Tenenbaum, and the upper bound a recent result of de la Bréteche and Tenenbaum.

The main result of this paper is an improvement of the upper bound to

$\displaystyle \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll (\log \log x)^{11/4}.$

It is still unclear to us exactly what to conjecture regarding the actual order of the mean value.

The reason we looked into this problem was that it was connected to forthcoming work of David Conlon, Jacob Fox, and Huy Pham on the following problem of Erdos: what is the size of the largest subset ${A}$ of ${\{1,\dots,N\}}$ with the property that no non-empty subset of ${A}$ sums to a perfect square? Erdos observed that one can obtain sets of size ${\gg N^{1/3}}$ (basically by considering certain homogeneous arithmetic progressions), and Nguyen and Vu showed an upper bound of ${\ll N^{1/3} (\log N)^{O(1)}}$ . With our mean value bound as input, together with several new arguments, Conlon, Fox, and Pham have been able to improve the upper bound to ${\ll N^{1/3} (\log\log N)^{O(1)})}$ .

Let me now discuss some of the ingredients of the proof. The first few steps are standard. Firstly we may restrict attention to square-free numbers without much difficulty (the point being that if a number ${n}$ factors as ${n = d^2 m}$ with ${m}$ squarefree, then ${\Delta(n) \leq \tau(d^2) \Delta(m)}$ ). Next, because a square-free number ${n>1}$ can be uniquely factored as ${n = pm}$ where ${p}$ is a prime and ${m}$ lies in the finite set ${{\mathcal S}_{<p}}$ of squarefree numbers whose prime factors are less than ${p}$ , and ${\Delta(n) \leq \tau(p) \Delta(m) = 2 \Delta(m)}$ , it is not difficult to establish the bound

$\displaystyle \frac{1}{x} \sum_{n \in {\mathcal S}_{<x}} \Delta(n) \ll \sup_{2 \leq y\leq x} \frac{1}{\log y} \sum_{n \in {\mathcal S}_{<y}} \Delta(n).$

The upshot of this is that one can replace an ordinary average with a logarithmic average, thus it suffices to show

$\displaystyle \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x}} \frac{\Delta(n)}{n} \ll (\log \log x)^{11/4}. \ \ \ \ \ (1)$

We actually prove a slightly more refined distributional estimate: for any ${A \geq 2}$ , we have a bound

$\displaystyle \Delta(n) \ll A \log^{3/4} A \ \ \ \ \ (2)$

outside of an exceptional set ${E}$ which is small in the sense that

$\displaystyle \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x} x: n \in E} \frac{1}{n} \ll \frac{1}{A}. \ \ \ \ \ (3)$

It is not difficult to get from this distributional estimate to the logarithmic average estimate (1) (worsening the exponent ${3/4}$ to ${3/4+2 = 11/4}$ ).

To get some intuition on the size of ${\Delta(n)}$ , we observe that if ${y > 0}$ and ${n_{<y}}$ is the factor of ${n}$ coming from the prime factors less than ${y}$ , then

$\displaystyle \Delta(n) \geq \Delta(n_{<y}) \gg \frac{\tau(n_{<y})}{\log n_{<y}}. \ \ \ \ \ (4)$

On the other hand, standard estimates let one establish that

$\displaystyle \tau(n_{<y}) \ll A \log n_{<y} \ \ \ \ \ (5)$

for all ${y}$ , and all ${n}$ outside of an exceptional set that is small in the sense (3); in fact it turns out that one can also get an additional gain in this estimate unless ${\log y}$ is close to ${A^{\log 4}}$ , which turns out to be useful when optimizing the bounds. So we would like to approximately reverse the inequalities in (4) and get from (5) to (2), possibly after throwing away further exceptional sets of size (3).

At this point we perform another standard technique, namely the moment method of controlling the supremum ${\Delta(n) = \sup_u \Delta(n;u)}$ by the moments

$\displaystyle M_q(n) := \int_{{\bf R}} \Delta(n,u)^q\ du$

for natural numbers ${q}$ ; it is not difficult to establish the bound

$\displaystyle \Delta(n) \ll M_q(n)^{1/q}$

and one expects this bound to become essentially sharp once ${q \sim \log\log x}$ . We will be able to show a moment bound

$\displaystyle \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n) / \tau(n)}{n} \leq O(q)^q A^{q-2} \log^{3q/4} A$

for any ${q \geq 2}$ for some exceptional set ${E_q}$ obeying the smallness condition (3) (actually, for technical reasons we need to improve the right-hand side slightly to close an induction on ${q}$ ); this will imply the distributional bound (2) from a standard Markov inequality argument (setting ${q \sim \log\log x}$ ).

The strategy is then to obtain a good recursive inequality for (averages of) ${M_q(n)}$ . As in the reduction to (1), we factor ${n=pm}$ where ${p}$ is a prime and ${m \in {\mathcal S}_{<p}}$ . One observes the identity

$\displaystyle \Delta(n;u) = \Delta(m;u) + \Delta(m;u-\log p)$

for any ${u}$ ; taking moments, one obtains the identity

$\displaystyle M_q(n) = \sum_{a+b=q; 0 \leq b \leq q} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.$

As in previous literature, one can try to average in ${p}$ here and apply Hölder’s inequality. But it convenient to first use the symmetry of the summand in ${a,b}$ to reduce to the case of relatively small values of ${b}$ :

$\displaystyle M_q(n) \leq 2 \sum_{a+b=q; 0 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.$

One can extract out the ${b=0}$ term as

$\displaystyle M_q(n) \leq 2 M_q(m)$

$\displaystyle + 2 \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.$

It is convenient to eliminate the factor of ${2}$ by dividing out by the divisor function:

$\displaystyle \frac{M_q(n)}{\tau(n)} \leq \frac{M_q(m)}{\tau(m)}$

$\displaystyle + \frac{1}{m} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.$

This inequality is suitable for iterating and also averaging in ${p}$ and ${m}$ . After some standard manipulations (using the Brun–Titchmarsh and Hölder inequalities), one is able to estimate sums such as

$\displaystyle \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n)/\tau(n)}{n} \ \ \ \ \ (6)$

in terms of sums such as

$\displaystyle \int_2^{x^2} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_a(n) M_b(n)}{\tau(n) n} \frac{dy}{\log^2 y}$

(assuming a certain monotonicity property of the exceptional set ${E_q}$ that turns out to hold in our application). By an induction hypothesis and a Markov inequality argument, one can get a reasonable pointwise upper bound on ${M_b}$ (after removing another exceptional set), and the net result is that one can basically control the sum (6) in terms of expressions such as

$\displaystyle \sum_{n \in {\mathcal S}_{<x} \backslash E_a} \frac{M_a(n)/\tau(n)}{n}$

for various ${a < q}$ . This allows one to estimate these expressions efficiently by induction.