SOTAVerified|Agents Browse Leaderboard About Blog

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 126–150 of 1596 papers

Title	Date	Tasks	Status	Hype
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles	May 26, 2025	ARCLogical Reasoning	—Unverified	0
Inference-time Alignment in Continuous Space	May 26, 2025	Math	CodeCode Available	0
AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models	May 25, 2025	MathMathematical Reasoning	—Unverified	0
MMATH: A Multilingual Benchmark for Mathematical Reasoning	May 25, 2025	MathMathematical Reasoning	CodeCode Available	0
Steering LLM Reasoning Through Bias-Only Adaptation	May 24, 2025	GSM8KMath	—Unverified	0
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions	May 24, 2025	Automated Theorem ProvingMath	CodeCode Available	0
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?	May 24, 2025	Code GenerationMath	—Unverified	0
MSA at BEA 2025 Shared Task: Disagreement-Aware Instruction Tuning for Multi-Dimensional Evaluation of LLMs as Math Tutors	May 24, 2025	Language ModelingLanguage Modelling	—Unverified	0
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization	May 24, 2025	MathReinforcement Learning (RL)	—Unverified	0
Anchored Diffusion Language Model	May 24, 2025	Language ModelingLanguage Modelling	—Unverified	0
How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark	May 24, 2025	Math	CodeCode Available	0
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models	May 23, 2025	DiagnosticHallucination	—Unverified	0
Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem Solving	May 23, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
VideoGameBench: Can Vision-Language Models complete popular video games?	May 23, 2025	Math	—Unverified	0
One RL to See Them All: Visual Triple Unified Reinforcement Learning	May 23, 2025	AllMath	—Unverified	0
Value-Guided Search for Efficient Chain-of-Thought Reasoning	May 23, 2025	Math	CodeCode Available	1
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning	May 23, 2025	MathReinforcement Learning (RL)	CodeCode Available	1
Outcome-based Reinforcement Learning to Predict the Future	May 23, 2025	Holdout SetMath	—Unverified	0
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs	May 23, 2025	Cross-Lingual TransferMath	—Unverified	0
RaDeR: Reasoning-aware Dense Retrieval Models	May 23, 2025	MathMathematical Problem-Solving	CodeCode Available	1
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models	May 22, 2025	Large Language ModelMath	CodeCode Available	0
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning	May 22, 2025	Mathreinforcement-learning	—Unverified	0
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning	May 22, 2025	MathReinforcement Learning (RL)	CodeCode Available	2
Incremental Sequence Classification with Temporal Consistency	May 22, 2025	ClassificationLanguage Modeling	—Unverified	0
Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning	May 22, 2025	AttributeMath	—Unverified	0

Show:10 25 50

← PrevPage 6 of 64Next →

No leaderboard results yet.