SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1351–1375 of 1596 papers

Title	Date	Tasks	Status	Hype	Score
The Hallucination Tax of Reinforcement Finetuning	May 20, 2025	HallucinationMath	—Unverified	0	0
Explaining Math Word Problem Solvers	Jul 24, 2023	Math	—Unverified	0	0
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation	Apr 4, 2025	MathMathematical Reasoning	—Unverified	0	0
Explanation Generation for a Math Word Problem Solver	Oct 1, 2015	Explanation GenerationMath	—Unverified	0	0
Explicit Knowledge Transfer for Weakly-Supervised Code Generation	Nov 30, 2022	Code GenerationFew-Shot Learning	—Unverified	0	0
Exploring Educational Equity: A Machine Learning Approach to Unravel Achievement Disparities in Georgia	Jan 25, 2024	Math	—Unverified	0	0
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate	May 22, 2023	BenchmarkingMath	—Unverified	0	0
Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them	Mar 20, 2025	MathMemorization	—Unverified	0	0
Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases	Mar 26, 2023	Math	—Unverified	0	0
Calculus on MDPs: Potential Shaping as a Gradient	Aug 20, 2022	Math	—Unverified	0	0
Exploring the Mystery of Influential Data for Mathematical Reasoning	Apr 1, 2024	MathMathematical Reasoning	—Unverified	0	0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning	Jun 16, 2024	BenchmarkingMath	—Unverified	0	0
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity	Jun 7, 2025	Math	—Unverified	0	0
Extracting the Unknown from Long Math Problems	Mar 22, 2021	Math	—Unverified	0	0
Fairness Hub Technical Briefs: AUC Gap	Sep 20, 2023	FairnessMath	—Unverified	0	0
Fairshare Data Pricing via Data Valuation for Large Language Models	Jan 31, 2025	Data ValuationMath	—Unverified	0	0
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4	Mar 5, 2025	Answer SelectionMath	—Unverified	0	0
BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems	Mar 18, 2025	CPUMath	—Unverified	0	0
Fast Diffusion Inhibits Disease Outbreaks	Jul 29, 2019	Math	—Unverified	0	0
Faster and Better LLMs via Latency-Aware Test-Time Scaling	May 26, 2025	Math	—Unverified	0	0
Feature Selection Based on Confidence Machine	Oct 20, 2014	feature selectionMath	—Unverified	0	0
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory	Mar 13, 2025	MathMultiple-choice	—Unverified	0	0
Few-Shot Recalibration of Language Models	Mar 27, 2024	MathMMLU	—Unverified	0	0
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning	Oct 8, 2024	GSM8KHallucination	—Unverified	0	0
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models	Mar 12, 2024	MathMathematical Reasoning	—Unverified	0	0

Show:10 25 50

← PrevPage 55 of 64Next →

No leaderboard results yet.