SOTAVerified

Math

Papers

Showing 13511400 of 1596 papers

TitleStatusHype
The Hallucination Tax of Reinforcement Finetuning0
Explaining Math Word Problem Solvers0
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation0
Explanation Generation for a Math Word Problem Solver0
Explicit Knowledge Transfer for Weakly-Supervised Code Generation0
Exploring Educational Equity: A Machine Learning Approach to Unravel Achievement Disparities in Georgia0
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate0
Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them0
Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases0
Calculus on MDPs: Potential Shaping as a Gradient0
Exploring the Mystery of Influential Data for Mathematical Reasoning0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning0
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity0
Extracting the Unknown from Long Math Problems0
Fairness Hub Technical Briefs: AUC Gap0
Fairshare Data Pricing via Data Valuation for Large Language Models0
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean40
BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems0
Fast Diffusion Inhibits Disease Outbreaks0
Faster and Better LLMs via Latency-Aware Test-Time Scaling0
Feature Selection Based on Confidence Machine0
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory0
Few-Shot Recalibration of Language Models0
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning0
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models0
The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian0
Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models0
First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning0
Fixation probabilities for the Moran process in evolutionary games with two strategies: graph shapes and large population asymptotics0
Fixation probabilities for the Moran process with three or more strategies: general and coupling results0
Building Math Agents with Multi-Turn Iterative Preference Learning0
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration0
The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule0
Formal Mathematical Reasoning: A New Frontier in AI0
The Long-Term Effects of Teachers' Gender Stereotypes0
fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models0
FRACTAL: Fine-Grained Scoring from Aggregate Text Labels0
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning0
From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems0
From fixation probabilities to d-player games: an inverse problem in evolutionary dynamics0
The Mathematics of Market Timing0
From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting0
From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision0
From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems0
From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics0
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens0
Bridging Offline and Online Reinforcement Learning for LLMs0
Breaking Ties: Regression Discontinuity Design Meets Market Design0
Gamifying Math Education using Object Detection0
GAPS: Geometry-Aware Problem Solver0
Show:102550
← PrevPage 28 of 32Next →

No leaderboard results yet.