Another two rational dodecahedra

Complex Projective 4-Space 2020-08-05

Since finding one rational dodecahedron inscribed in the unit sphere, I decided to port the search program to CUDA so that it can run on a GPU and thereby search a larger space in a reasonable amount of time. Firstly, let us recall our search methodology:

circles

We exhaustively look for all small positive integer points (a, b) and (c, d) satisfying b < d and ad > bc. The first of these inequalities ensures that (c, d) is further up the upper-left circle than (a, b); the second inequality ensures that the line through the origin and (a, b) intersects the upper-right circle at two distinct points.

For each such candidate pair of points, we analytically compute x and y, checking that they are rational. If so, we check that the four indicated points on the lower circle are indeed concyclic by performing a final determinant check.

This is embarrassingly parallel, so well-suited for a GPU. We launch one thread for each pair of 1 ≤ a, d ≤ N, where N = 10000, so a total of 10^8 threads are launched. Each thread performs two nested loops: the outer loop searches each value of b up to (but excluding) d; the inner loop starts with c = 1 and increments it until the inequality ad > bc is violated or c exceeds the bound N.

Dealing with integer overflow

One rather annoying bugbear for searching a larger space was that of integer overflow. In the process of computing x, we need to take the square-root of a degree-6 polynomial in the four integer parameters a, b, c, d and check that this square-root is an integer*:

sextic

*since the expression for x is rational if and only if $\sqrt{K^2 - L^2(c^2 + d^2)}$ is an integer.

When the integer parameters grow beyond 1000, this degree-6 polynomial can easily overflow the maximum representable 64-bit integer. To circumvent this problem, we take the following ugly approach:

Compute $K^2 - L^2(c^2 + d^2)$ in both double-precision floating-point arithmetic (which is approximate) and in 64-bit integer arithmetic (which is exact, but reduced modulo 2^64).
‘Repair’ any loss of relative accuracy (caused by the subtraction) in the double-precision approximation by rounding to the nearest multiple of 2^64 and adding the exact result reduced into the interval [-2^63, 2^63 – 1].
Compute the double-precision square-root. This will have comparable relative error, and therefore small absolute error (i.e. less than 1).
Cast the square-root back to a 64-bit integer (which won’t overflow).
Check this integer and nearby values to see whether they square (modulo 2^64) to the desired result.

Source code: here.

The analogous computation for y involves polynomials of degree at most 4, so we can safely search all a, b, c, d <= 10000 without overflow. The final determinant check is just the evaluation of a polynomial, so we can use arithmetic modulo 2^64 and not worry about integer overflow until later.

Hardware

I opted to use an NVIDIA Volta V100 because it’s the most powerful GPU to which I currently have access (via Amazon Web Services). The V100 is huge, consisting of 80 streaming multiprocessors, each of which is capable of simultaneously issuing 4 instructions per cycle, where each instruction is vectorised over 32 ‘threads’. (For this reason, the GPU is sometimes advertised as having 80 × 4 × 32 = 5120 ‘CUDA cores’, but these are not comparable with CPU cores; a modern CPU core with vectorisation capabilities and multithreading is more akin to an entire streaming multiprocessor.)

Schematic of the Volta V100 from the whitepaper. Only 80 of the 84 streaming multiprocessors are actually ‘activated’; this means that the fabrication of the chip is allowed to contain up to 4 manufacturing defects without affecting the validity of the resulting GPU.

Each of the 80 streaming multiprocessors also has 8 ‘tensor cores’, which perform reduced-precision matrix multiplications useful for neural networks. These tensor cores were unused by this search program, as the nature of this particular problem requires high-precision 64-bit integer and double-precision floating-point computations.

Compiling the program with the switch -Xptxas=-v shows that there is no expensive register spillage or other performance problems:

$ nvcc -O3 -Xptxas=-v -lineinfo cudodeca.cu -o cudodecaptxas info : 56 bytes gmemptxas info : Compiling entry function '_Z12dodecakernelv' for 'sm_52'ptxas info : Function properties for _Z12dodecakernelv64 bytes stack frame, 0 bytes spill stores, 0 bytes spill loadsptxas info : Used 47 registers, 320 bytes cmem[0], 24 bytes cmem[2]

Sixteen hours of continual uninterrupted runtime on the V100 (costing a total of about fifty US dollars) were sufficient to exhaust this space and show that there exist at most 3 primitive solutions within these bounds (full output here). The second and third solutions are much larger than the first:

Solution 1:   a=22,   b=21,   c=22,   d=54,   x=40,   y=10Solution 2:   a=1680, b=1474, c=2023, d=2601, x=2890, y=578Solution 3:   a=2860, b=3915, c=3297, d=6594, x=7065, y=2355

The reason for saying ‘at most 3 primitive solutions’ is that the final determinant check is performed modulo 2^64, so false positives can slip through. To be absolutely sure of these results, we need an additional manual verification step using unbounded (‘bigint’) integer arithmetic. Unbounded integers are natively provided in languages such as Python, Haskell, and Wolfram Mathematica; we use the latter for convenience.

Verification

We can check the three determinants for the putative solution (1680, 1474, 2023, 2601, 2890, 578)…

$ wolfram12.0.0Mathematica 12.0.0 Kernel for Linux x86 (64-bit)Copyright 1988-2019 Wolfram Research, Inc.In[1]:= f = Function[{x, y}, {1, x, y, x^2 + y^2}] Out[1]= Function[{x, y}, {1, x, y, x^2 + y^2}]In[2]:= Det[{f[a, b], f[x, 0], f[0, y], f[0, -y]}] /. {a -> 1680, b -> 1474, c -> 2023, d -> 2601, x -> 2890, y -> 578} Out[2]= 0In[3]:= Det[{f[(c^2+d^2)/x, 0], f[c, d], f[a, b], f[x, 0]}] /. {a -> 1680, b -> 1474, c -> 2023, d -> 2601, x -> 2890, y -> 578} Out[3]= 0In[4]:= Det[{f[c, d], f[a, b], f[-a, b], f[0, y]}] /. {a -> 1680, b -> 1474, c -> 2023, d -> 2601, x -> 2890, y -> 578} Out[4]= 0

…and likewise for the putative solution (2860, 3915, 3297, 6594, 7065, 2355)…

In[5]:= Det[{f[a, b], f[x, 0], f[0, y], f[0, -y]}] /. {a -> 2860, b -> 3915, c -> 3297, d -> 6594, x -> 7065, y -> 2355}Out[5]= 0In[6]:= Det[{f[c, d], f[a, b], f[-a, b], f[0, y]}] /. {a -> 2860, b -> 3915, c -> 3297, d -> 6594, x -> 7065, y -> 2355}Out[6]= 0In[7]:= Det[{f[(c^2+d^2)/x, 0], f[c, d], f[a, b], f[x, 0]}] /. {a -> 2860, b -> 3915, c -> 3297, d -> 6594, x -> 7065, y -> 2355}Out[7]= 0

…confirming that these are indeed valid primitive solutions.