SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 501–525 of 1596 papers

Title	Date	Tasks	Status	Hype
Injecting Numerical Reasoning Skills into Language Models	Apr 9, 2020	Data AugmentationDecoder	CodeCode Available	1
Explaining Datasets in Words: Statistical Models with Natural Language Parameters	Sep 13, 2024	ClusteringLanguage Modeling	CodeCode Available	1
Implicit Chain of Thought Reasoning via Knowledge Distillation	Nov 2, 2023	Knowledge DistillationMath	CodeCode Available	1
How to Get Your LLM to Generate Challenging Problems for Evaluation	Feb 20, 2025	Code CompletionMath	CodeCode Available	1
HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics	Oct 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
EXAONE Deep: Reasoning Enhanced Language Models	Mar 16, 2025	Math	CodeCode Available	1
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles	Jun 18, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	1
How well do Large Language Models perform in Arithmetic tasks?	Mar 16, 2023	Math	CodeCode Available	1
FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains	Nov 16, 2023	MathMath Word Problem Solving	CodeCode Available	1
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits	Oct 2, 2024	Instruction FollowingMath	CodeCode Available	1
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports	May 16, 2025	DiagnosticMath	CodeCode Available	1
Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference	Oct 4, 2023	MathQuestion Answering	CodeCode Available	1
Examining the Robustness of Large Language Models across Language Complexity	Jan 30, 2025	Math	—Unverified	0
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil	Aug 9, 2024	MathMultiple-choice	—Unverified	0
Can Stories Help LLMs Reason? Curating Information Space Through Narrative	Oct 25, 2024	Math	—Unverified	0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization	Feb 8, 2025	GSM8KMath	—Unverified	0
Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning	May 21, 2025	MathMathematical Reasoning	—Unverified	0
A range characterization of the single-quadrant ADRT	Oct 11, 2020	Math	—Unverified	0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages	Feb 12, 2024	Automated Theorem ProvingBenchmarking	—Unverified	0
AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models	May 25, 2025	MathMathematical Reasoning	—Unverified	0
Hard Math -- Easy UVM: Pragmatic solutions for verifying hardware algorithms using UVM	Dec 6, 2024	Math	—Unverified	0
Evaluating the Design Features of an Intelligent Tutoring System for Advanced Mathematics Learning	Dec 23, 2024	Math	—Unverified	0
Evaluating Robustness of Reward Models for Mathematical Reasoning	Oct 2, 2024	MathMathematical Reasoning	—Unverified	0
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation	May 29, 2025	GSM8KMath	—Unverified	0
A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions	Dec 12, 2024	GSM8KKnowledge Graphs	—Unverified	0

Show:10 25 50

← PrevPage 21 of 64Next →

No leaderboard results yet.