AI models could offer mathematicians a common language

WORKING OUT how to most efficiently pack a crate full of oranges may seem like a juvenile pursuit for professional mathematicians. And yet the sphere-packing problem, as this pastime is properly known, confounded geometers for centuries. A breakthrough came in 1998 when Thomas Hales, a mathematician then at the University of Michigan, claimed to have proved what had long been conjectured: that hexagonal stacking, in which each sphere sits in the recess formed by a circle of six on the layer beneath it, provides the densest possible configuration.

The story did not quite end there. It took over a decade of checking and rechecking before his fellow mathematicians were happy that Dr Hales’s calculations were, in fact, correct. Such a state of affairs is not unusual. Before a mathematical conclusion can be elevated to the exalted status of proof, its steps have to be painstakingly examined symbol by symbol and proposition by proposition. That highlights a “core bottleneck in mathematics, which is trust”, says Patrick Shafto, who works at America’s Defence Advanced Research Projects Agency (DARPA).

Dr Shafto leads a team hoping to use artificial intelligence to accelerate the rate of progress in pure mathematics by, among other things, streamlining this formalisation process. If highbrow maths-literate large language models (LLMs) can indeed certify existing proofs, as well as help develop new ones, many mathematicians hope they could speed up discovery in an area that is otherwise painfully slow.

One thing that is already clear is that LLMs work through mathematical logic very differently from humans. Before getting stuck into a challenging computation, humans like to devise at least the outline of a plan of attack—setting out their intentions with the aim of writing proofs that can be followed at each step. LLMs, by contrast, run as a “stream of consciousness”, says Terence Tao, a mathematician at the University of California, Los Angeles. Such systems, therefore, work through problems based on what they think they should see next, not future steps; they meander and sound more like “improv dialogue rather than scripted text”.

This improvisational approach nonetheless pays dividends. In recent months AI models have proved capable of solving long-standing open mathematical questions as well as entirely new problems designed to flummox them. “The field is moving very rapidly,” says Dr Tao.

For Pushmeet Kohli, who heads AI for Science work at Google DeepMind, an AI lab, progress depends on ensuring these models express themselves in ways mathematicians can follow. One of the models he and his team have produced with this aim in mind is AlphaEvolve, a tool designed to generate proofs for optimisation problems: ones whose goal is to find mathematical objects that best meet some given criteria (densest sphere-packing, say). Although some techniques to solve such problems already existed, they were accessible only to experts. AlphaEvolve, by contrast, can be prompted using “natural language”, meaning non-experts can work with it. Dr Tao, who helped with its development, says that its reasoning can occasionally be hard to follow. But he feels that DeepThink AI, another Google tool, is able to adequately explain its working.

There are plenty of other players in the space. One is Harmonic, an American startup. When its bot, known as Aristotle, verifies a proof, it tries to follow the steps as submitted by the user. It does this by first translating the proof into a rigorous language of symbols and axioms that allows each step to be checked. Aristotle makes use of Lean, an open-source coding language popular with mathematicians. If the human proof-writer has made small errors, Aristotle then attempts to fix them. If the human has skipped some steps (mathematicians often make intuitive leaps), it fills in the details. The result is an airtight proof that follows the same principles as the original, says Tudor Achim, Harmonic’s boss. If a human wants to understand it, they still have to do some leg-work to decompose it, he explains, “But they know that every single step is correct.”

Math, Inc. is another AI startup pursuing the same goal. It is developing a model called Gauss that can also convert a human-written proof into lines of Lean code. One pair of targets it had been pursuing were proofs for higher-dimensional versions of the sphere-packing problem obtained in 2016 by Maryna Viazovska, a mathematician then at Humboldt University of Berlin. Gauss successfully formalised these proofs—for 8-dimensional and 24-dimensional spheres, respectively—within weeks. Even though both proofs had already been verified without AI, Gauss’s work has contributed to mathematicians’ understanding of the tools in Dr Viazovska’s proofs.

At DARPA, Dr Shafto hopes to find a way to automate translation between natural language and formal languages such as Lean, and decompose complex proofs into the propositions they are typically made up of. Bringing what he describes, at present, as “a hot mess of papers, text books and human heads” into a unified body of work would allow younger mathematicians to more readily navigate the existing literature. That could radically accelerate the progress of pure mathematicians and lower the barrier to entry for others. For now, though, he warns that mathematical expertise remains a valuable skill for users. “The AI makes mistakes, and you have to be able to figure out where,” he says.

There are other pitfalls. Timothy Gowers, a mathematician at Cambridge University, is working on automatic theorem proving, in which computers are trained to find proofs by trying to mirror how humans do so. As part of that work, the lab is exploring whether AI models can make non-obvious connections between mathematical subfields, rather than simply retreading old ground. When human mathematicians are thinking through alternative approaches to a problem, says Dr Gowers, “you have to sort of dig around; there’s a whole process by which you try to generate non-standard ideas.”

So far, the models’ ability to replicate human creativity has fallen short. One failing, says Dr Gowers, is that LLMs struggle to apply what they have learned in solving one problem when tackling another. Human mathematicians also develop what he calls an aesthetic sense, prompting them to look for neater proofs that can sometimes yield surprising results. That combination of aesthetic sense and deep knowledge seems quite difficult for AI models to emulate. On that front, at least, “humans still have the edge.”

Bots can also misbehave in surprising ways. In March Donald Knuth, a mathematician at Stanford University, was working with his colleague Filip Stappers on a problem akin to that of the travelling salesman, in which the shortest route has to be found for a salesman to visit a number of cities once before returning home. To help move things along, they turned to Claude Opus 4.6, an LLM developed by Anthropic.

After using several different approaches, Claude was able to solve the problem for cases when, at each stop, an odd number of routes remained to choose from. But when it was pushed to solve the problem for all even numbers, the model malfunctioned. “In the end, it was not even able to write and run explore programs correctly any more,” wrote Mr Stappers, referring to the code used by the model to express its reasoning. “Very weird.” All the same, wrote Dr Knuth, the way Claude tackled the problem left him impressed. (A few weeks later another researcher built on this work using ChatGPT 5.4 Pro, another LLM, to solve the problem for even numbers.)

If these difficulties can be overcome, there are prizes aplenty on the horizon. In theory, an AI model able to retain the mathematical corpus could make new connections which have long eluded human researchers. And a model that can effectively reason about mathematics could also be taught to reason about other quantitative fields, from economics to physics. With such immense problem-solving abilities at their disposal, says Mr Kohli, the real challenge for humans will be finding the next set of problems worth tackling. ■

Curious about the world? To enjoy our mind-expanding science coverage, sign up to Simply Science, our weekly subscriber-only newsletter.