AI Generates Original Mathematical Proofs, Ushering in Era of Accelerated Discoveries

Publication Date: August 23, 2025

Overview

Artificial intelligence systems are now generating novel mathematical insights and solving problems that have eluded human experts for years, marking a pivotal shift in how breakthroughs occur. Recent examples include OpenAI’s GPT-5 Pro producing a verified proof that advances convex optimization theory and Caltech’s AI tackling counterexamples in group theory unsolved for decades.

These developments, emerging in rapid succession over the past months, enable AI to act as a collaborative partner in research, potentially speeding up discoveries in fields from cryptography to crisis forecasting. This capability challenges traditional workflows, inspiring mathematicians to refine their approaches while raising questions about the future integration of AI in education and innovation.

Facts

Researchers at the California Institute of Technology developed an AI system using reinforcement learning to address the Andrews-Curtis conjecture, a 60-year-old unsolved problem in mathematical group theory. The system identified and disproved families of potential counterexamples that had remained open for 25 years and made progress on another family unsolved for 44 years, processing sequences requiring thousands to billions of steps.
On August 20, 2025, OpenAI researcher Sebastien Bubeck reported that GPT-5 Pro generated a proof improving the bound for convexity in the function value curve of gradient descent iterates from 1/L to 1.5/L (where L is the smoothness parameter), based on an open problem from a March 2025 arXiv paper. Bubeck verified the proof as correct, noting it evolved from the original paper’s approach but was distinct.
An experimental OpenAI model achieved a gold medal score at the 2025 International Mathematical Olympiad by solving five out of six problems correctly, earning 35 out of 42 points without external aids, outperforming 96% of the 630 human competitors.
ByteDance’s Seed-Prover AI system, developed by Chinese researchers, solved 78.1% of all past IMO problems, over 50% of Putnam exam problems, and achieved 100% on OpenAI’s miniF2F benchmark, using Lean for formal proofs and introducing a novel geometry reasoning engine.
At Harvard University, AI models advanced from solving 30-50% of nonlinear partial differential equation problems in fall 2023 to acing the hardest ones by spring 2025, prompting a redesign of the graduate-level Applied Mathematics 201 course to incorporate AI-generated problems.
DARPA launched the Exponentiating Mathematics (expMath) program in May 2025 to develop AI as mathematical “co-authors” that break down complex problems into lemmas, aiming to accelerate discoveries in areas like cryptography and materials science.

Perspectives

OpenAI Researchers (Sebastien Bubeck, Alex Wei, Sheryl Hsu): They emphasize AI’s role in advancing general-purpose reasoning for mathematical proofs, viewing math as an objective testing ground for AGI. Bubeck highlighted GPT-5 Pro’s ability to prove new bounds in convex optimization, stating it produced a correct, novel contribution worthy of publication. Wei noted the philosophy is to build methods that work beyond math, while Hsu stressed the importance of AI recognizing its knowledge limits to reduce hallucinations and enable sustained progress on hard problems.
Caltech Team (Sergei Gukov and Ali Shehper): The researchers position their AI as specialized in finding rare, high-reward instances in complex mathematical mazes, with potential applications in predicting outlying events like financial crashes. Gukov stated the system “may contain the seeds of what would be required to make intelligent predictions of this nature,” while Shehper described the challenge as “trying to find your way through a maze the size of Earth,” where only one path works amid billions of steps.
Harvard Faculty (Michael Brenner): Brenner views AI’s rapid math improvements as transformative for both research and teaching, noting it forces a reevaluation of course structures. He redesigned his class to have students create problems that challenge AI, stating, “My hope is that we can solve problems faster and we can get more work done. Science is infinite. There’s no limit.”
DARPA Program Manager (Patrick Shafto): Shafto advocates for AI to revolutionize mathematical workflows by acting as co-authors that democratize access and accelerate progress. He explained, “Just as computers once transformed calculations, expMath technology could put powerful mathematical tools at everyone’s fingertips, redefining the pace of discovery if successful,” to ensure U.S. technological leadership.
ByteDance AI Researchers: As developers of Seed-Prover, they focus on overcoming LLMs’ limitations in theorem proving through reinforcement learning and formal verification in Lean. Their system demonstrates broad reasoning capabilities, achieving high performance on IMO, Putnam, and other benchmarks, positioning it as a tool for automated theorem proving that refines proofs iteratively and handles geometry via a dedicated engine.
Mathematician Alexei Miasnikov (Stevens Institute of Technology): An expert on the Andrews-Curtis conjecture, Miasnikov praised Caltech’s AI results as “beyond the expectations,” highlighting reinforcement learning’s utility for experimental mathematics and its ability to uncover patterns in vast datasets that humans might overlook.

Considerations

AI’s ability to generate verifiable proofs for open problems accelerates research timelines, enabling faster advancements in dependent fields like cryptography and physics in the short term, while fostering hybrid human-AI collaborations for paradigm-shifting discoveries over the long term.
Educational curricula at universities are adapting to incorporate AI as a tool for problem creation and verification, potentially enhancing student creativity in the near future but requiring safeguards against over-reliance that could erode foundational skills long-term.
National security implications arise from AI-driven math breakthroughs, as programs like DARPA’s expMath aim to maintain competitive edges in areas such as fluid dynamics and materials science, though global competition from entities like ByteDance underscores the need for international cooperation to avoid technological divides.
Crisis prediction capabilities improve through AI’s handling of rare events in large datasets, offering immediate tools for forecasting financial instabilities or natural disasters, with long-term potential to integrate into public policy for proactive risk management.
Ethical integration of AI in mathematics promotes broader access to advanced tools, democratizing discovery in the short term while necessitating ongoing evaluation to ensure transparency and mitigate biases in algorithmic reasoning over time.