SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–375 of 1596 papers

Title	Date	Tasks	Status	Hype
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models	Mar 3, 2025	Math	—Unverified	0
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts	Feb 28, 2025	MathMathematical Reasoning	—Unverified	0
MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training	Feb 28, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving	Feb 27, 2025	GSM8KMath	CodeCode Available	1
Self-Training Elicits Concise Reasoning in Large Language Models	Feb 27, 2025	GSM8KIn-Context Learning	CodeCode Available	1
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning	Feb 27, 2025	MathMedical Question Answering	—Unverified	0
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?	Feb 26, 2025	Math	CodeCode Available	1
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation	Feb 26, 2025	Code GenerationHumanEval	CodeCode Available	2
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning	Feb 25, 2025	MathMathematical Reasoning	—Unverified	0
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution	Feb 25, 2025	MathReinforcement Learning (RL)	—Unverified	0
From Euler to AI: Unifying Formulas for Mathematical Constants	Feb 24, 2025	Math	CodeCode Available	0
Learning Decentralized Swarms Using Rotation Equivariant Graph Neural Networks	Feb 24, 2025	Graph Neural NetworkMath	CodeCode Available	0
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models	Feb 24, 2025	GSM8KMath	CodeCode Available	2
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning	Feb 24, 2025	MathMathematical Reasoning	CodeCode Available	0
Reasoning with Latent Thoughts: On the Power of Looped Transformers	Feb 24, 2025	Language ModelingLanguage Modelling	—Unverified	0
DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling	Feb 23, 2025	Computational EfficiencyMath	—Unverified	0
SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance	Feb 23, 2025	Math	—Unverified	0
Inference Computation Scaling for Feature Augmentation in Recommendation Systems	Feb 22, 2025	MathRecommendation Systems	—Unverified	0
Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning	Feb 21, 2025	Math	—Unverified	0
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer	Feb 21, 2025	MathMathematical Reasoning	CodeCode Available	0
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind	Feb 21, 2025	MathMathematical Problem-Solving	CodeCode Available	1
How to Get Your LLM to Generate Challenging Problems for Evaluation	Feb 20, 2025	Code CompletionMath	CodeCode Available	1
S*: Test Time Scaling for Code Generation	Feb 20, 2025	Code GenerationMath	CodeCode Available	7
GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks	Feb 20, 2025	Code GenerationMath	CodeCode Available	0
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning	Feb 20, 2025	Mathreinforcement-learning	CodeCode Available	7

Show:10 25 50

← PrevPage 15 of 64Next →

No leaderboard results yet.