Plan of paper

This plan is no longer maintained, in favor of the actual paper draft in this folder.

Author list

This should be alphabetical (but see Appendix).

Introduction

Magmas and equational laws

Introduce the key definitions, list some past results. Have an appendix listing all equations that are actually discussed in the paper with brief commentary (basically a condensed version of this tour), also discussing the numbering system used, as well as OEIS sequences for number of equations with a given number of operations.

Need to understand the state of the art of undecidability. In particular, is it known that the EquationX => Equation Y problem for a single variable is undecidable in general? See: 1975 paper "On spectra, and the negative solution of the decision problem for identities having a finite nontrivial model" by Ralph McKenzie and this discussion Mention also the Robbins problem which was an early motivation for automated theorem proving in this area.

Equational Theories Project

Describe the initial aims and history of the project. Also discuss extension to finite magma implications (refer to Austin pair section).

Mention BusyBeaver challenge as a precursor

Contrast with "Polymath" projects

Cite Phillips-Vojtechovsky https://arxiv.org/abs/math/0701714 and followup work (for a similar smaller scale project on Moufang loops), as well as this blog post of Wolfram. Also this paper of McCune

Speculative: uncover "Mathematics from Mars", as discussed here.

Summary of equation analysis techniques

Some recapitulation of this thread. Several can be discussed in separate sections afterwards. Perhaps make a separation between easily automatable techniques (that can be used to sweep the entire graph), and techniques that require human attention (which one would only perform either on selected subgraphs, or on survivors of automated sweeps).

Mathematical foundations

Free magmas (including those relative to theories), completeness theorem, some abstract nonsense. Confluence (unique simplification), and more generally complete rewriting systems.

Formal foundations

Describe the Lean framework we used to formalize the project, covering technical issues such as

Magma operation symbol issues
Syntax (LawX) versus semantics (EquationX)
"Universe hell" issues
Additional verification (axiom checking, Leanchecker, etc.)
use of conjecture keyword
Use of namespaces to avoid collisions between contributions. (Note: we messed up with this with FreeMagma! Should have namespaced back end results as well as front end ones.)
Use of Facts command to efficiently handle large numbers of anti-implications at once

Upstream contributions

Mathlib

https://github.com/leanprover-community/mathlib4/pulls?q=is%3Apr+is%3Abody+EquationalTheories+

LeanBlueprint

https://github.com/PatrickMassot/leanblueprint/pulls?q=is%3Apr+in%3Abody+EquationalTheories+

Project management

Shreyas Srinivas and Pietro Monticone have volunteered to take the lead on this section.

Discuss topics such as

Project generation from template (which will be extended with all the relevant project design and management lessons learnt during the development of this project)
Github issue management (labels, task management dashboard)
Use of continuous integration (project build, blueprint compilation, task status transition)
Pre-push git hooks to save actions usage
Use of blueprint [small note, see #406: blueprint chapters should be given names for stable URLs)
Use of Lean Zulip (e.g. use of polls)

Maybe give some usage statistics, e.g. drawing from https://github.com/teorth/equational_theories/actions/metrics/usage

Mention that FLT is also using a similar workflow

Handing scaling issues

Also mention some early human-managed efforts ("outstanding tasks", manually generated Hasse diagram, etc.) which suffices for the first one or two days of the project but rapidly became unable to handle the scale of the project.

Mention that some forethought in setting up a Github organizational structure with explicit admin roles etc. may have had some advantages if we had done so in the planning stages of the project, but it was workable without this structure (the main issue is that a single person - Terry - had to be the one to perform various technical admin actions).

Use of transitive reduction etc. to keep the Lean codebase manageable. Note that the project is large enough that one cannot simply accept arbitrary amounts of Lean code into the codebase, as this could make compilation times explode. Also note somewhere that transitive completion can be viewed as directed graph completion on a doubled graph consisting of laws and their formal negations.

Technical debt issues, e.g., complications stemming from an early decision to make Equations.lean and AllEquations.lean the ground truth of equations for other analysis and visualization tools, leading to the need to refactor when AllEquations.lean had to be split up for performance reasons

Note that the "blueprint" that is now standard for guiding proof formalization projects is a bit too slow to keep up with this sort of project that is oriented instead about proving new results. Often new results are stated and formalized without passing through the blueprint, which is then either updated after the fact, or not at all. So the blueprint is more of an optional auxiliary guiding tool than an essential component of the workflow.

Other design considerations

Explain what "trusting" lean really means in a large project. Highlight the kind of human issues that can interfere with this and how use of tools like external checkers and PR reviews by people maintaining the projects still matters. Provide guidelines on good practices (such as branch protection or watching out for spurious modifications in PRs, for example to the CI). Highlight the importance of following a proper process for discussing and accepting new tasks, avoiding overlaps etc. These issues are less likely to arise in projects with one clearly defined decision maker as in this case, and more likely to arise when the decision making has to be delegated to many maintainers.

Note that despite the guarantees provided by Lean, non-Lean components still contained bugs. For instance an off-by-one error in an ATP run created a large number of spurious conjectures, and some early implementations of duality reductions (external to Lean) were similarly buggy. "Unit tests", e.g., checking conjectured outputs against Lean-validated outputs, or by theoretical results such as duality symmetry, were helpful, and the Equation Explorer visualization tool also helped human collaborators detect bugs.

Meta: documenting thoughts for the future record is quite valuable. It would be extremely tedious to try to reconstruct real-time impressions long after the fact just from the github commit history and Zulip chat archive.

Maintenance

Describe the role of maintainers and explain why they need to be conversant in the mathematics being formalised, as well as lean. As such the role of maintainers is often akin to postdocs or assistant profs in a research group who do some research of their own, but spend much of their time to guide others in performing their tasks, the key difference being that contributors are volunteers who choose their own tasks. Explain the tasks maintainers must perform. Examples:

reviewing proofs,
helping with proofs and theorem statements when people get stuck,
Offering suggestions and guidance on how to produce shorter or more elegant proofs,
ensuring some basic standards are met in proof blocks that make proofs robust to upstream changes,
Creating and maintaining CI processes.
responding to task requests,
evaluating theorem and definition formulations (for example unifying many theorem statements into one using FactsSyntax).
suggesting better ones where possible,
ensuring that there is no excessive and pointless overlap of content in different contributions (TODO: elaborate on what level of overlap was permissible and what we consider excessive).

Finite magmas and other sources of counterexamples

Describe various sources of example magmas used in counterexamples, including the ones listed here. Note that linear magmas let one assign an "affine scheme" to each law that can be used to rule out many, but not all, implications.

Discuss semi-automated creation of finite counterexamples (as discussed here)
Also note some "negative results" - classes of finite magmas that did not yield many additional refutations, e.g. commutative 5x5 magmas.
Translation-invariant magmas (see e.g., this thread for a nice example). Note: any magma with a transitive symmetry will lift to a translation-invariant model, so this helps explain why these are common examples. Also symmetric models could be slightly more likely to obey various laws than general models due to degree of freedom considerations.
Using SAT solvers to find medium sized finite magmas obeying a given law? See this discussion.

Tree based constructions, see here.

Discuss computational and memory efficiencies needed to brute force over extremely large sets of magmas. SAT solving may be a better approach past a certain size!

Metatheorems

List some notable metatheorems (e.g., duality). Also include metatheorems that did not mature in time for deployment before the automated tools resolved most implications. They could still be useful for future researchers!

Automated theorem proving

Describe various automated theorem provers deployed in the project, with some statistics on performance

Z3 prover
Vampire
More elementary substitutions
egg
duper

Any comparative study of semi-automated methods with fully automated ones? In principle the semi-automated approach could be more automated using a script or "agent" to call various theorem provers. See this discussion.

Draw upon the discussion here on the various ways of integrating ATPs with Lean. This project primarily used the least integrated approach, "Option 3", as it was fastest and required no dependencies on the other contributors, but it also has drawbacks.

Note: when evaluating the performance of any particular automated implication tool, we should do a fresh run on the entire base of implications, rather than rely on whatever implications produced by the tool remain in the Lean codebase by the time of writing the paper. This is because (a) many of the previous runs focused only on those implications that were not already ruled out by earlier contributions, and (b) some pruning has been applied subsequent to the initial runs to improve compilation performance. Of course, these runs would not need to be added to the (presumably optimized) codebase at that point, but would be purely for the purpose of gathering performance statistics. More discussion here.

See this discussion on the value of using different ATPs and setting run time parameters etc. at different values.

Make a note of the possible alternate strategy to prove implications outlined here.

Austin pairs

A new short Austin pair: Equation 3944 implies Equation 3588 for finite magmas, but not for infinite magmas. Also: 206 !-> 1648, and Asterix !-> Oberlix, and Oberlix !-> Asterix are additional Austin pairs.

Recap the results from [this thread[[https://leanprover.zulipchat.com/#narrow/channel/458659-Equational/topic/Austin.20pairs]

AI/ML-assisted contributions

Claude assistance in coding front ends
ChatGPT to guess a complete rewriting system
Minor use of Github Copilot etc. to autocomplete code in Lean and other languages.
See this discussion

User interface

Describe visualizations and explorer tools: Equation Explorer, Finite Magma Explorer, Graphiti, ...

Statistics and experiments

Data analysis of the implication graph

Mention the long chain 2 => 5 => 2499 => 2415 => 238 => 2716 => 28 => 2973 => 270 => 3 => 3715 => 375 => 359 => 4065 => 1 (discussed here).
What are the "most difficult" implications?

Is there a way to generate a standard test set of implication problems of various difficulty levels? Can one then use this to benchmark various automated and semi-automated methods? Challenge: how does one automatically assign a difficulty level to a given (anti-)implication?

See this for a preliminary data analysis of the impact of equation size and the number of variables.

Data management

Discuss how the data was handled, and how it will be stored going forward. Upload to Zenodo? mathbases.org?

Reflections

Testimonies from individual participants (but perhaps this is more suited for a retrospective paper). Utilize the thoughts and reflections thread.

Automation often overtook the rate of human progress, for instance in developing metatheorems. Does this create an opportunity cost? Raised as a possible discussion point here by Zoltan Kocsis.

Conclusions and future directions

Insights learned, future speculation. Utilize the discussion on future directions. Some ideas from that list:

A database of triple implications (EquationX and EquationY imply EquationZ) - see also this discussion.
Are there any equational laws that have no non-trivial finite models, but have surjunctive models?
Can we produce interesting examples of irreducible implications EquationX -> EquationY (no intermediate EquationZ can interpose)
Degree of satisfiability results, e.g., if a central groupoid obeys the natural central groupoid axiom 99% of the time, is it a natural central groupoid?
Kisielewicz's question: does 5093 have any infinite models?
"Insight mining" the large corpus of automated proofs that have been generated.
How machine-learnable is the implication graph? (See AI/ML section)

Acknowlegements

Can thank the broader Lean Zulip community, and many small contributions from anyone who for whatever reason did not wish to be listed as an author

Appendix: proofs of theoretical results

Here we add the interesting proofs mentioned in the results section; for really routine proofs we can just refer to blueprint/lean.

Appendix: author contributions

List author contributions, possibly using the CRediT categories . We may wish to elaborate on how we are interpreting each category at the beginning of the appendix, and possibly add subcategories to make more fine-grained distinctions.

Also add affiliations and grant acknowledgments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly