SOTAVerified|Agents Browse Leaderboard About Blog

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 1596 papers

Title	Date	Tasks	Status	Hype	Score
Energy-Based Transformers are Scalable Learners and Thinkers	Jul 2, 2025	DenoisingImage Denoising	VerifiedCommunity Verified — 1 reproduction	5	18
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools	Jun 18, 2024	AllGSM8K	CodeCode Available	14	5
Qwen2.5 Technical Report	Dec 19, 2024	Common Sense Reasoning	CodeCode Available	13	5
Qwen2.5-Coder Technical Report	Sep 18, 2024	Code Generation	CodeCode Available	11	5
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models	Feb 5, 2024	Arithmetic ReasoningMath	CodeCode Available	9	5
AgentRxiv: Towards Collaborative Autonomous Research	Mar 23, 2025	Mathscientific discovery	CodeCode Available	9	5
s1: Simple test-time scaling	Jan 31, 2025	Language ModelingLanguage Modelling	CodeCode Available	9	5
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence	Jun 17, 2024	16kLanguage Modeling	CodeCode Available	9	5
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model	Sep 3, 2024	DecoderMath	CodeCode Available	9	5
TTRL: Test-Time Reinforcement Learning	Apr 22, 2025	Mathreinforcement-learning	CodeCode Available	7	5
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines	Oct 5, 2023	Language ModelingLanguage Modelling	CodeCode Available	7	5
O1 Replication Journey: A Strategic Progress Report -- Part 1	Oct 8, 2024	Mathscientific discovery	CodeCode Available	7	5
EvoAgentX: An Automated Framework for Evolving Agentic Workflows	Jul 4, 2025	Code GenerationMath	CodeCode Available	7	5
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking	Jan 8, 2025	Math	CodeCode Available	7	5
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback	Jun 13, 2024	Instruction FollowingMath	CodeCode Available	7	5
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models	May 6, 2023	Math	CodeCode Available	7	5
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning	May 30, 2025	GPUMath	CodeCode Available	7	5
OpenThoughts: Data Recipes for Reasoning Models	Jun 4, 2025	Math	CodeCode Available	7	5
Kimi k1.5: Scaling Reinforcement Learning with LLMs	Jan 22, 2025	Mathreinforcement-learning	CodeCode Available	7	5
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild	Mar 24, 2025	Instruction FollowingMath	CodeCode Available	7	5
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!	Feb 11, 2025	Large Language ModelMath	CodeCode Available	7	5
StarCoder 2 and The Stack v2: The Next Generation	Feb 29, 2024	Code CompletionCode Generation	CodeCode Available	7	5
S*: Test Time Scaling for Code Generation	Feb 20, 2025	Code GenerationMath	CodeCode Available	7	5
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning	Feb 20, 2025	Mathreinforcement-learning	CodeCode Available	7	5
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference	Mar 17, 2025	MambaMath	CodeCode Available	7	5

Show:10 25 50

← PrevPage 1 of 64Next →

No leaderboard results yet.