Matrix multiplication update

Complex Projective 4-Space 2022-10-06

At the end of the recent post on a combinatorial proof of Houston’s identity, I ended with the following paragraph:

This may seem paradoxical, but there’s an analogous situation in fast matrix multiplication: the best known upper bound for the tensor rank of 4-by-4 matrix multiplication is 49, by applying two levels of Strassen’s algorithm, but there is a little-known method by Winograd for multiplying two 4-by-4 matrices over a commutative ring using only 48 multiplications.

DeepMind’s AlphaTensor (blog post, Nature article, source code) has since found many non-equivalent 49-multiplication algorithms for 4-by-4 matrix multiplication, as well as improving on the state-of-the-art for some other sizes of matrix multiplication tensor.

As far as I can tell, DeepMind’s new algorithms do not beat the matrix multiplication exponent implied by Smirnov’s incredible algorithm for computing a 3-by-3-by-6 rectangular matrix multiplication using only 40 multiplications:

Naive exponent: 3
Strassen exponent: log(7) / log(2) = 2.8074
Smirnov exponent: log(64000) / log(54) = 2.7743

There are also a few galactic algorithms (which are not used in practice because their advantage over Smirnov only applies to extremely large matrices) which improve the exponent considerably further:

Coppersmith-Winograd: 2.3755
Virginia Vassilevska Williams: 2.3729

Matrix multiplication in GL(4, 2)

DeepMind’s AlphaTensor also found a record-breaking 47-multiplication algorithm for 4-by-4 matrix multiplication over the finite field $\mathbb{F}_2$ . One reason why 4-by-4 matrix multiplication over $\mathbb{F}_2$ in particular is interesting is that the group of invertible matrices, GL(4, 2), is isomorphic to the alternating group A_8.

To summarise, here are the various algorithms for 4-by-4 matrix multiplication over $\mathbb{F}_2$ (all except the DeepMind algorithm work over arbitrary rings):

Naive: 64 multiplications + 48 additions;
Strassen^1: 56 multiplications + 88 additions;
Strassen^2: 49 multiplications + 165 additions;
Winograd: 48 multiplications + 120 additions;
DeepMind: 47 multiplications + ???* additions;

where the DeepMind algorithm can be applied recursively (unlike Winograd) to yield better exponents for large matrix multiplication over this field. Note that multiplications and additions over $\mathbb{F}_2$ are individual AND and XOR gates, respectively. The naive algorithm has the advantage that the circuit has a small number of gates (112) and very low depth (3).

*AlphaTensor finds low-tensor-rank decompositions for tensors, rather than arithmetic circuits, so there’s a subsequent optimisation problem to reduce the number of additions. For example, Strassen’s algorithm for 2-by-2 matrices originally cost 7 multiplications and 18 additions, but the number of additions was later reduced to 15.