Decomposing a factorial into large factors

What's new 2025-03-27

I’ve just uploaded to the arXiv the paper “Decomposing a factorial into large factors“. This paper studies the quantity ${t(N)}$ , defined as the largest quantity such that it is possible to factorize ${N!}$ into ${N}$ factors ${a_1, \dots, a_N}$ , each of which is at least ${t(N)}$ . The first few values of this sequence are

$\displaystyle 1,1,1,2,2,2,2,2,3,3,3,3,3,4, \dots$

(OEIS A034258). For instance, we have ${t(9)=3}$ , because on the one hand we can factor

$\displaystyle 9! = 3 \times 3 \times 3 \times 3 \times 4 \times 4 \times 5 \times 7 \times 8$

but on the other hand it is not possible to factorize ${9!}$ into nine factors, each of which is ${4}$ or higher.

This quantity ${t(N)}$ was introduced by Erdös, who asked for upper and lower bounds on ${t(N)}$ ; informally, this asks how equitably one can split up ${N!}$ into ${N}$ factors. When factoring an arbitrary number, this is essentially a variant of the notorious knapsack problem (after taking logarithms), but one can hope that the specific structure of the factorial ${N!}$ can make this particular knapsack-type problem more tractable. Since

$\displaystyle N! = a_1 \dots a_N \geq t(N)^N$

for any putative factorization, we obtain an upper bound

$\displaystyle t(N) \leq (N!)^{1/N} = \frac{N}{e} + O(\frac{\log N}{N}) \ \ \ \ \ (1)$

thanks to the Stirling approximation. At one point, Erdös, Selfridge, and Strauss claimed that this upper bound was asymptotically sharp, in the sense that

$\displaystyle t(N) = \frac{N}{e} + o(N) \ \ \ \ \ (2)$

as ${N \rightarrow \infty}$ ; informally, this means we can split ${N!}$ into ${N}$ factors that are (mostly) approximately the same size, when ${N}$ is large. However, as reported in this later paper, Erdös “believed that Straus had written up our proof… Unfortunately Straus suddenly died and no trace was ever found of his notes. Furthermore, we never could reconstruct our proof, so our assertion now can be called only a conjecture”.

Some further exploration of ${t(N)}$ was conducted by Guy and Selfridge. There is a simple construction that gives the lower bound

$\displaystyle t(N) \geq \frac{3}{16} N - o(N)$

that comes from starting with the standard factorization ${N! = 1 \times 2 \times \dots \times N}$ and transferring some powers of ${2}$ from the later part of the sequence to the earlier part to rebalance the terms somewhat. More precisely, if one removes one power of two from the even numbers between ${\frac{3}{8}N}$ and ${N}$ , and one additional power of two from the multiples of four between ${\frac{3}{4}}$ to ${N}$ , this frees up ${\frac{3}{8}N + o(N)}$ powers of two that one can then distribute amongst the numbers up to ${\frac{3}{16} N}$ to bring them all up to at least ${\frac{3}{16} N - o(N)}$ in size. A more complicated procedure involving transferring both powers of ${2}$ and ${3}$ then gives the improvement ${t(N) \geq \frac{1}{4} N - o(N)}$ . At this point, however, things got more complicated, and the following conjectures were made by Guy and Selfridge:

(i) Is ${\frac{t(N)}{N} \leq \frac{1}{e}}$ for all ${N \neq 1,2,4}$ ?
(ii) Is ${t(N) \geq \lfloor 2N/7 \rfloor}$ for all ${N \neq 56}$ ? (At ${N=56}$ , this conjecture barely fails: ${t(56) = 15 < 16 = \lfloor 2 \times 56/7 \rfloor}$ .)
(iii) Is ${\frac{t(N)}{N} \geq \frac{1}{3}}$ for all ${N \geq 300000}$ ?

In this note we establish the bounds

$\displaystyle \frac{1}{e} - \frac{O(1)}{\log N} \leq \frac{t(N)}{N} \leq \frac{1}{e} - \frac{c_0+o(1)}{\log N} \ \ \ \ \ (3)$

as ${N \rightarrow \infty}$ , where ${c_0}$ is the explicit constant

$\displaystyle c_0 := \frac{1}{e} \int_0^1 \left \lfloor \frac{1}{x} \right\rfloor \log \left( ex \left \lceil \frac{1}{ex} \right\rceil \right)\ dx \approx 0.3044.$

In particular this recovers the lost result (2). An upper bound of the shape

$\displaystyle t(N) \leq \frac{1}{e} - \frac{c+o(1)}{\log N} \ \ \ \ \ (4)$

for some ${c>0}$ was previously conjectured by Erdös and Graham (Erdös problem #391). We conjecture that the upper bound in (3) is sharp, thus

$\displaystyle \frac{t(N)}{N} = \frac{1}{e} - \frac{c_0+o(1)}{\log N}, \ \ \ \ \ (5)$

which is consistent with the above conjectures (i), (ii), (iii) of Guy and Selfridge, although numerically the convergence is somewhat slow.

The upper bound argument for (3) is simple enough that it could also be modified to establish the first conjecture (i) of Guy and Selfridge; in principle, (ii) and (iii) are now also reducible to a finite computation, but unfortunately the implied constants in the lower bound of (3) are too weak to make this directly feasible. However, it may be possible to now crowdsource the verification of (ii) and (iii) by supplying a suitable set of factorizations to cover medium sized ${N}$ , combined with some effective version of the lower bound argument that can establish ${\frac{t(N)}{N} \geq \frac{1}{3}}$ for all ${N}$ past a certain threshold. The value ${N=300000}$ singled out by Guy and Selfridge appears to be quite a suitable test case: the constructions I tried fell just a little short of the conjectured threshold of ${100000}$ , but it seems barely within reach that a sufficiently efficient rearrangement of factors can work here.

We now describe the proof of the upper and lower bound in (3). To improve upon the trivial upper bound (1), one can use the large prime factors of ${N!}$ . Indeed, every prime ${p}$ between ${N/e}$ and ${N}$ divides ${N!}$ at least once (and the ones between ${N/e}$ and ${N/2}$ divide it twice), and any factor ${a_i}$ that contains such a factor therefore has to be significantly larger than the benchmark value of ${N/e}$ . This observation already readily leads to some upper bound of the shape (4) for some ${c>0}$ ; if one also uses the primes ${p}$ that are slightly less than ${N/e}$ (noting that any multiple of ${p}$ that exceeds ${N/e}$ , must in fact exceed ${\lceil N/ep \rceil p}$ ) is what leads to the precise constant ${c_0}$ .

For previous lower bound constructions, one started with the initial factorization ${N! = 1 \times \dots \times N}$ and then tried to “improve” this factorization by moving around some of the prime factors. For the lower bound in (3), we start instead with an approximate factorization roughly of the shape

$\displaystyle N! \approx (\prod_{t \leq n < t + 2N/A, \hbox{ odd}} n)^A$

where ${t}$ is the target lower bound (so, slightly smaller than ${N/e}$ ), and ${A}$ is a moderately sized natural number parameter (we will take ${A \asymp \log^3 N}$ , although there is significant flexibility here). If we denote the right-hand side here by ${B}$ , then ${B}$ is basically a product of ${N}$ numbers of size at least ${t}$ . It is not literally equal to ${N!}$ ; however, an easy application of Legendre’s formula shows that for odd small primes ${p}$ , ${N!}$ and ${B}$ have almost exactly the same number of factors of ${p}$ . On the other hand, as ${B}$ is odd, ${B}$ contains no factors of ${2}$ , while ${N!}$ contains about ${N}$ such factors. The prime factorizations of ${B}$ and ${N!}$ differ somewhat at large primes, but ${B}$ has slightly more such prime factors as ${N!}$ (about ${\frac{N}{\log N} \log 2}$ such factors, in fact). By some careful applications of the prime number theorem, one can tweak some of the large primes appearing in ${B}$ to make the prime factorization of ${B}$ and ${N!}$ agree almost exactly, except that ${B}$ is missing most of the powers of ${2}$ in ${N!}$ , while having some additional large prime factors beyond those contained in ${N!}$ to compensate. With a suitable choice of threshold ${t}$ , one can then replace these excess large prime factors with powers of two to obtain a factorization of ${N!}$ into ${N}$ terms that are all at least ${t}$ , giving the lower bound.

The general approach of first locating some approximate factorization of ${N!}$ (where the approximation is in the “adelic” sense of having not just approximately the right magnitude, but also approximately the right number of factors of ${p}$ for various primes ${p}$ ), and then moving factors around to get an exact factorization of ${N!}$ , looks promising for also resolving the conjectures (ii), (iii) mentioned above. For instance, I was numerically able to verify that ${t(300000) \geq 90000}$ by the following procedure:

Start with the approximate factorization of ${N!}$ , ${N = 300000}$ by ${B = (\prod_{90000 \leq n < 102000, \hbox{ odd}} n)^{50}}$ . Thus ${B}$ is the product of ${N}$ odd numbers, each of which is at least ${90000}$ .
Call an odd prime ${B}$ -heavy if it divides ${B}$ more often than ${N!}$ , and ${N!}$ -heavy if it divides ${N!}$ more often than ${B}$ . It turns out that there are ${14891}$ more ${B}$ -heavy primes than ${N!}$ -heavy primes (counting multiplicity). On the other hand, ${N!}$ contains ${2999992}$ powers of ${2}$ , while ${B}$ has none. This represents the (multi-)set of primes one has to redistribute in order to convert a factorization of ${B}$ to a factorization of ${N!}$ .
Using a greedy algorithm, one can match a ${B}$ -heavy prime ${p'}$ to each ${N!}$ -heavy prime ${p}$ (counting multiplicity) in such a way that ${p' \leq 2^{m_p} p}$ for a small ${m_p}$ (in most cases one can make ${m_p=0}$ , and often one also has ${p'=p}$ ). If we then replace ${p'}$ in the factorization of ${B}$ by ${2^{m_p} p}$ for each ${N!}$ -heavy prime ${p}$ , this increases ${B}$ (and does not decrease any of the ${N}$ factors of ${B}$ ), while eliminating all the ${N!}$ -heavy primes. With a somewhat crude matching algorithm, I was able to do this using ${\sum_p m_p = 39992}$ of the ${299992}$ powers of ${2}$ dividing ${N!}$ , leaving ${260000}$ powers remaining at my disposal. (I don’t claim that this is the most efficient matching, in terms of powers of two required, but it sufficed.)
There are still ${14891}$ ${B}$ -heavy primes left over in the factorization of (the modified version of) ${B}$ . Replacing each of these primes with ${2^{17} \geq 90000}$ , and then distributing the remaining ${260000 - 17 \times 14891 = 6853}$ powers of two arbitrarily, this obtains a factorization of ${N!}$ into ${N}$ terms, each of which are at least ${90000}$ .

However, I was not able to adjust parameters to reach ${t(300000) \geq 100000}$ in this manner. Perhaps some readers here who are adept with computers can come up with a more efficient construction to get closer to this bound? If one can find a way to reach this bound, most likely it can be adapted to then resolve conjectures (ii) and (iii) above after some additional numerical effort.