A pilot project in universal algebra to explore new ways to collaborate and use machine assistance?

What's new 2024-09-26

Traditionally, mathematics research projects are conducted by a small number (typically one to five) of expert mathematicians, each of which are familiar enough with all aspects of the project that they can verify each other’s contributions. It has been challenging to organize mathematical projects at larger scales, and particularly those that involve contributions from the general public, due to the need to verify all of the contributions; a single error in one component of a mathematical argument could invalidate the entire project. Furthermore, the sophistication of a typical math project is such that it would not be realistic to expect a member of the public, with say an undergraduate level of mathematics education, to contribute in a meaningful way to many such projects.

For related reasons, it is also challenging to incorporate assistance from modern AI tools into a research project, as these tools can “hallucinate” plausible-looking, but nonsensical arguments, which therefore need additional verification before they could be added into the project.

Proof assistant languages, such as Lean, provide a potential way to overcome these obstacles, and allow for large-scale collaborations involving professional mathematicians, the broader public, and/or AI tools to all contribute to a complex project, provided that it can be broken up in a modular fashion into smaller pieces that can be attacked without necessarily understanding all aspects of the project as a whole. Projects to formalize an existing mathematical result (such as the formalization of the recent proof of the PFR conjecture of Marton, discussed in this previous blog post) are currently the main examples of such large-scale collaborations that are enabled via proof assistants. At present, these formalizations are mostly crowdsourced by human contributors (which include both professional mathematicians and interested members of the general public), but there are also some nascent efforts to incorporate more automated tools (either “good old-fashioned” automated theorem provers, or more modern AI-based tools) to assist with the (still quite tedious) task of formalization.

However, I believe that this sort of paradigm can also be used to explore new mathematics, as opposed to formalizing existing mathematics. The online collaborative “Polymath” projects that several people including myself organized in the past are one example of this; but as they did not incorporate proof assistants into the workflow, the contributions had to be managed and verified by the human moderators of the project, which was quite a time-consuming responsibility, and one which limited the ability to scale these projects up further. But I am hoping that the addition of proof assistants will remove this bottleneck.

I am particularly interested in the possibility of using these modern tools to explore a class of many mathematical problems at once, as opposed to the current approach of focusing on only one or two problems at a time. This seems like an inherently modularizable and repetitive task, which could particularly benefit from both crowdsourcing and automated tools, if given the right platform to rigorously coordinate all the contributions; and it is a type of mathematics that previous methods usually could not scale up to (except perhaps over a period of many years, as individual papers slowly explore the class one data point at a time until a reasonable intuition about the class is attained). Among other things, having a large data set of problems to work on could be helpful for benchmarking various automated tools and compare the efficacy of different workflows.

One recent example of such a project was the Busy Beaver Challenge, which showed this July that the fifth Busy Beaver number ${BB(5)}$ was equal to ${47176870}$ . Some older crowdsourced computational projects, such as the Great Internet Mersenne Prime Search (GIMPS), are also somewhat similar in spirit to this type of project (though using more traditional proof of work certificates instead of proof assistants). I would be interested in hearing of any other extant examples of crowdsourced projects exploring a mathematical space, and whether there are lessons from those examples that could be relevant for the project I propose here.

More specifically I would like to propose the following (admittedly artificial) project as a pilot to further test out this paradigm, which was inspired by a MathOverflow question from last year, and discussed somewhat further on my Mastodon account shortly afterwards.

The problem is in the field of universal algebra, and concerns the (medium-scale) exploration of simple equational theories for magmas. A magma is nothing more than a set ${G}$ equipped with a binary operation ${\circ: G \times G \rightarrow G}$ . Initially, no additional axioms on this operation ${\circ}$ are imposed, and as such magmas by themselves are somewhat boring objects. Of course, with additional axioms, such as the identity axiom or the associative axiom, one can get more familiar mathematical objects such as groups, semigroups, or monoids. Here we will be interested in (constant-free) equational axioms, which are axioms of equality involving expressions built from the operation ${\circ}$ and one or more indeterminate variables in ${G}$ . Two familiar examples of such axioms are the commutative axiom

$\displaystyle x \circ y = y \circ x$

and the associative axiom

$\displaystyle (x \circ y) \circ z = x \circ (y \circ z),$

where ${x,y,z}$ are indeterminate variables in the magma ${G}$ . On the other hand the (left) identity axiom ${e \circ x = x}$ would not be considered an equational axiom here, as it involves a constant ${e \in G}$ (the identity element), which we will not consider here.

To illustrate the project I have in mind, let me first introduce eleven examples of equational axioms for magmas:

Equation1: ${x=y}$
Equation2: ${x \circ y = z \circ w}$
Equation3: ${x \circ y = x}$
Equation4: ${(x \circ x) \circ y = y \circ x}$
Equation5: ${x \circ (y \circ z) = (w \circ u) \circ v}$
Equation6: ${x \circ y = x \circ z}$
Equation7: ${x \circ y = y \circ x}$
Equation8: ${x \circ (y \circ z) = (x \circ w) \circ u}$
Equation9: ${x \circ (y \circ z) = (x \circ y) \circ w}$
Equation10: ${x \circ (y \circ z) = (x \circ y) \circ z}$
Equation11: ${x = x}$

Thus, for instance, Equation7 is the commutative axiom, and Equation10 is the associative axiom. The constant axiom Equation1 is the strongest, as it forces the magma ${G}$ to have at most one element; at the opposite extreme, the reflexive axiom Equation11 is the weakest, being satisfied by every single magma.

One can then ask which axioms imply which others. For instance, Equation1 implies all the other axioms in this list, which in turn imply Equation11. Equation8 implies Equation9 as a special case, which in turn implies Equation10 as a special case. The full poset of implications can be depicted by the following Hasse diagram:

This in particular answers the MathOverflow question of whether there were equational axioms intermediate between the constant axiom Equation1 and the associative axiom Equation10.

Most of the implications here are quite easy to prove, but there is one non-trivial one, obtained in this answer to a MathOverflow post closely related to the preceding one:

Proposition 1 Equation4 implies Equation7.

Proof: Suppose that ${G}$ obeys Equation4, thus

$\displaystyle (x \circ x) \circ y = y \circ x \ \ \ \ \ (1)$

for all ${x,y \in G}$ . Specializing to ${y=x \circ x}$ , we conclude

$\displaystyle (x \circ x) \circ (x \circ x) = (x \circ x) \circ x$

and hence by another application of (1) we see that ${x \circ x}$ is idempotent:

$\displaystyle (x \circ x) \circ (x \circ x) = x \circ x. \ \ \ \ \ (2)$

Now, replacing ${x}$ by ${x \circ x}$ in (1) and then using (2), we see that

$\displaystyle (x \circ x) \circ y = y \circ (x \circ x),$

so in particular ${x \circ x}$ commutes with ${y \circ y}$ :

$\displaystyle (x \circ x) \circ (y \circ y) = (y \circ y) \circ (x \circ x). \ \ \ \ \ (3)$

Also, from two applications (1) one has

$\displaystyle (x \circ x) \circ (y \circ y) = (y \circ y) \circ x = x \circ y.$

Thus (3) simplifies to ${x \circ y = y \circ x}$ , which is Equation7. $\Box$

A formalization of the above argument in Lean can be found here.

I will remark that the general question of determining whether one set of equational axioms determines another is undecidable; see Theorem 14 of this paper of Perkins. (This is similar in spirit to the more well known undecidability of various word problems.) So, the situation here is somewhat similar to the Busy Beaver Challenge, in that past a certain point of complexity, we would necessarily encounter unsolvable problems; but hopefully there would be interesting problems and phenomena to discover before we reach that threshold.

The above Hasse diagram does not just assert implications between the listed equational axioms; it also asserts non-implications between the axioms. For instance, as seen in the diagram, the commutative axiom Equation7 does not imply the Equation4 axiom

$\displaystyle (x+x)+y = y + x.$

To see this, one simply has to produce an example of a magma that obeys the commutative axiom Equation7, but not the Equation4 axiom; but in this case one can simply choose (for instance) the natural numbers ${{\bf N}}$ with the addition operation ${x \circ y := x+y}$ . More generally, the diagram asserts the following non-implications, which (together with the indicated implications) completely describes the poset of implications between the eleven axioms:

Equation2 does not imply Equation3.
Equation3 does not imply Equation5.
Equation3 does not imply Equation7.
Equation5 does not imply Equation6.
Equation5 does not imply Equation7.
Equation6 does not imply Equation7.
Equation6 does not imply Equation10.
Equation7 does not imply Equation6.
Equation7 does not imply Equation10.
Equation9 does not imply Equation8.
Equation10 does not imply Equation9.
Equation10 does not imply Equation6.

The reader is invited to come up with counterexamples that demonstrate some of these implications. The hardest type of counterexamples to find are the ones that show that Equation9 does not imply Equation8: a solution (in Lean) can be found here. I placed proofs in Lean of all the above implications and anti-implications can be found in this github repository file.

As one can see, it is already somewhat tedious to compute the Hasse diagram of just eleven equations. The project I propose is to try to expand this Hasse diagram by a couple orders of magnitude, covering a significantly larger set of equations. The set I propose is the set ${{\mathcal E}}$ of equations that use the magma operation ${\circ}$ at most four times, up to relabeling and the reflexive and symmetric axioms of equality; this includes the eleven equations above, but also many more. How many more? Recall that the Catalan number ${C_n}$ is the number of ways one can form an expression out of ${n}$ applications of a binary operation ${\circ}$ (applied to ${n+1}$ placeholder variables); and, given a string of ${m}$ placeholder variables, the Bell number ${B_m}$ is the number of ways (up to relabeling) to assign names to each of these variables, where some of the placeholders are allowed to be assigned the same name. As a consequence, ignoring symmetry, the number of equations that involve at most four operations is

$\displaystyle \sum_{n,m \geq 0: n+m \leq 4} C_n C_m B_{n+m+2} = 9131.$

The number of equations in which the left-hand side and right-hand side are identical is

$\displaystyle \sum_{n=0}^2 C_n B_{n+1} = 1 * 1 + 1 * 2 + 2 * 5 = 13;$

these are all equivalent to reflexive axiom (Equation11). The remaining ${9118}$ equations come in pairs by the symmetry of equality, so the total size of ${{\mathcal E}}$ is

$\displaystyle 1 + \frac{9118}{2} = 4560.$

I have not yet generated the full list of such identities, but presumably this will be straightforward to do in a standard computer language such as Python (I have not tried this, but I imagine some back-and-forth with a modern AI would let one generate most of the required code).

It is not clear to me at all what the geometry of ${{\mathcal E}}$ will look like. Will most equations be incomparable with each other? Will it stratify into layers of “strong” and “weak” axioms? Will there be a lot of equivalent axioms? It might be interesting to record now any speculations as what the structure of this poset, and compare these predictions with the outcome of the project afterwards.

A brute force computation of the poset ${{\mathcal E}}$ would then require ${4560 \times (4560-1) = 20789040}$ comparisons, which looks rather daunting; but of course due to the axioms of a partial order, one could presumably identify the poset by a much smaller number of comparisons. I am thinking that it should be possible to crowdsource the exploration of this poset in the form of submissions to a central repository (such as the github repository I just created) of proofs in Lean of implications or non-implications between various equations, which could be validated in Lean, and also checked against some file recording the current status (true, false, or open) of all the ${20789040}$ comparisons, to avoid redundant effort. Most submissions could be handled automatically, with relatively little human moderation required; and the status of the poset could be updated after each such submission.

I would imagine that there is some “low-hanging fruit” that could establish a large number of implications (or anti-implications) quite easily. For instance, laws such as Equation2 or Equation3 more or less completely describe the binary operation ${\circ}$ , and it should be quite easy to check which of the ${4560}$ laws are implied by either of these two laws. The poset ${{\mathcal E}}$ has a reflection symmetry associated to replacing the binary operator ${\circ}$ by its reflection ${\circ^{\mathrm{op}}: (x,y) \mapsto y \circ x}$ , which in principle cuts down the total work by a factor of about two. Specific examples of magmas, such as the natural numbers with the addition operation, obey some set of equations in ${{\mathcal E}}$ but not others, and so could be used to generate a large number of anti-implications. Some existing automated proving tools for equational logic, such as Prover9 and Mace4 (for obtaining implications and anti-implications respectively), could then be used to handle most of the remaining “easy” cases (though some work may be needed to convert the outputs of such tools into Lean). The remaining “hard” cases could then be targeted by some combination of human contributors and more advanced AI tools.

Perhaps, in analogy with formalization projects, we could have a semi-formal “blueprint” evolving in parallel with the formal Lean component of the project. This way, the project could accept human-written proofs by contributors who do not necessarily have any proficiency in Lean, as well as contributions from automated tools (such as the aforementioned Prover9 and Mace4), whose output is in some other format than Lean. The task of converting these semi-formal proofs into Lean could then be done by other humans or automated tools; in particular I imagine modern AI tools could be particularly valuable for this portion of the workflow. I am not quite sure though if existing blueprint software can scale to handle the large number of individual proofs that would be generated by this project; and as this portion would not be formally verified, a significant amount of human moderation might also be needed here, and this also might not scale properly. Perhaps the semi-formal portion of the project could instead be coordinated on a forum such as this blog, in a similar spirit to past Polymath projects.

It would be nice to be able to integrate such a project with some sort of graph visualization software that can take an incomplete determination of the poset ${{\mathcal E}}$ as input (in which each potential comparison ${E \implies E'}$ in ${{\mathcal E}}$ is marked as either true, false, or open), completes the graph as much as possible using the axioms of partial order, and then presents the partially known poset in a visually appealing way. If anyone knows of such a software package, I would be happy to hear of it in the comments.

Anyway, I would be happy to receive any feedback on this project; in addition to the previous requests, I would be interested in any suggestions for improving the project, as well as gauging whether there is sufficient interest in participating to actually launch it. (I am imagining running it vaguely along the lines of a Polymath project, though perhaps not formally labeled as such.)