SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 376–400 of 1596 papers

Title	Date	Tasks	Status	Hype
CER: Confidence Enhanced Reasoning in LLMs	Feb 20, 2025	MathMathematical Reasoning	CodeCode Available	0
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective	Feb 20, 2025	GSM8KMath	CodeCode Available	0
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics	Feb 20, 2025	Math	—Unverified	0
SIFT: Grounding LLM Reasoning in Contexts via Stickers	Feb 19, 2025	GSM8KMath	CodeCode Available	2
BeamLoRA: Beam-Constraint Low-Rank Adaptation	Feb 19, 2025	Code GenerationMath	—Unverified	0
DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation	Feb 19, 2025	DiversityExtreme Summarization	—Unverified	0
The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?	Feb 19, 2025	Math	—Unverified	0
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation	Feb 19, 2025	Dataset GenerationGSM8K	CodeCode Available	0
Reasoning with Reinforced Functional Token Tuning	Feb 19, 2025	Math	CodeCode Available	1
Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization	Feb 18, 2025	Math	—Unverified	0
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees	Feb 18, 2025	Math	—Unverified	0
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks	Feb 18, 2025	MathMemorization	—Unverified	0
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning	Feb 18, 2025	Math	CodeCode Available	2
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions	Feb 18, 2025	Knowledge DistillationMath	—Unverified	0
Thinking Outside the (Gray) Box: A Context-Based Score for Assessing Value and Originality in Neural Text Generation	Feb 18, 2025	DiversityMath	—Unverified	0
Thinking Preference Optimization	Feb 17, 2025	Math	CodeCode Available	1
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task	Feb 17, 2025	Code CompletionGSM8K	—Unverified	0
Scaling Test-Time Compute Without Verification or RL is Suboptimal	Feb 17, 2025	MathReinforcement Learning (RL)	—Unverified	0
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving	Feb 17, 2025	MathMathematical Problem-Solving	—Unverified	0
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption	Feb 17, 2025	BenchmarkingCode Summarization	—Unverified	0
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding	Feb 17, 2025	Arithmetic ReasoningChart Understanding	—Unverified	0
A Study on Leveraging Search and Self-Feedback for Agent Reasoning	Feb 17, 2025	Math	—Unverified	0
Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge Distillation	Feb 17, 2025	Knowledge DistillationMath	CodeCode Available	0
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models	Feb 17, 2025	Math	—Unverified	0
Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL	Feb 17, 2025	Code GenerationMath	CodeCode Available	1

Show:10 25 50

← PrevPage 16 of 64Next →

No leaderboard results yet.