SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 776–800 of 1596 papers

Title	Date	Tasks	Status	Hype
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning	Jun 16, 2024	BenchmarkingMath	—Unverified	0
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity	Jun 7, 2025	Math	—Unverified	0
Extracting the Unknown from Long Math Problems	Mar 22, 2021	Math	—Unverified	0
Fairness Hub Technical Briefs: AUC Gap	Sep 20, 2023	FairnessMath	—Unverified	0
Fairshare Data Pricing via Data Valuation for Large Language Models	Jan 31, 2025	Data ValuationMath	—Unverified	0
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4	Mar 5, 2025	Answer SelectionMath	—Unverified	0
BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems	Mar 18, 2025	CPUMath	—Unverified	0
Fast Diffusion Inhibits Disease Outbreaks	Jul 29, 2019	Math	—Unverified	0
Faster and Better LLMs via Latency-Aware Test-Time Scaling	May 26, 2025	Math	—Unverified	0
Feature Selection Based on Confidence Machine	Oct 20, 2014	feature selectionMath	—Unverified	0
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory	Mar 13, 2025	MathMultiple-choice	—Unverified	0
Few-Shot Recalibration of Language Models	Mar 27, 2024	MathMMLU	—Unverified	0
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning	Oct 8, 2024	GSM8KHallucination	—Unverified	0
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models	Mar 12, 2024	MathMathematical Reasoning	—Unverified	0
The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian	Mar 27, 2024	Language ModellingMath	—Unverified	0
Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models	Jun 16, 2025	Math	—Unverified	0
First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning	Nov 14, 2023	GSM8KMath	—Unverified	0
Fixation probabilities for the Moran process in evolutionary games with two strategies: graph shapes and large population asymptotics	Apr 30, 2018	Math	—Unverified	0
Fixation probabilities for the Moran process with three or more strategies: general and coupling results	Nov 23, 2018	Math	—Unverified	0
Building Math Agents with Multi-Turn Iterative Preference Learning	Sep 4, 2024	GSM8KMath	—Unverified	0
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration	Oct 22, 2024	Math	—Unverified	0
The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule	Aug 4, 2024	Math	—Unverified	0
Formal Mathematical Reasoning: A New Frontier in AI	Dec 20, 2024	Automated Theorem ProvingMath	—Unverified	0
The Long-Term Effects of Teachers' Gender Stereotypes	Dec 16, 2022	Math	—Unverified	0
fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models	Oct 7, 2024	Math	—Unverified	0

Show:10 25 50

← PrevPage 32 of 64Next →

No leaderboard results yet.