SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–475 of 1596 papers

Title	Date	Tasks	Status	Hype
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees	Mar 11, 2025	ChatbotLanguage Modeling	CodeCode Available	1
Efficient RL Training for Reasoning Models via Length-Aware Optimization	May 18, 2025	Math	CodeCode Available	1
Injecting Numerical Reasoning Skills into Language Models	Apr 9, 2020	Data AugmentationDecoder	CodeCode Available	1
Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning	May 12, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
How well do Large Language Models perform in Arithmetic tasks?	Mar 16, 2023	Math	CodeCode Available	1
Eliminating Position Bias of Language Models: A Mechanistic Approach	Jul 1, 2024	Mathobject-detection	CodeCode Available	1
How to Get Your LLM to Generate Challenging Problems for Evaluation	Feb 20, 2025	Code CompletionMath	CodeCode Available	1
Implicit Chain of Thought Reasoning via Knowledge Distillation	Nov 2, 2023	Knowledge DistillationMath	CodeCode Available	1
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning	Mar 2, 2024	MathMisconceptions	CodeCode Available	1
ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models	May 23, 2023	Math	CodeCode Available	1
ArMATH: a Dataset for Solving Arabic Math Word Problems	Jun 1, 2022	Deep LearningMath	CodeCode Available	1
Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning	Jun 4, 2023	Math	CodeCode Available	1
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics	Oct 28, 2024	Arithmetic ReasoningMath	CodeCode Available	1
Teaching Language Models to Self-Improve through Interactive Demonstrations	Oct 20, 2023	Math	CodeCode Available	1
Entropy-Regularized Process Reward Model	Dec 15, 2024	GSM8KMath	CodeCode Available	1
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization	Aug 14, 2024	InformativenessInstruction Following	CodeCode Available	1
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping	Feb 16, 2025	Code GenerationInstruction Following	CodeCode Available	1
Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems	Sep 24, 2020	DiversityMath	CodeCode Available	1
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?	Mar 14, 2024	Hallucinationimage-classification	CodeCode Available	1
The Geometry of Concepts: Sparse Autoencoder Feature Structure	Oct 10, 2024	Math	CodeCode Available	1
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems	May 17, 2025	Arithmetic ReasoningCode Generation	CodeCode Available	1
Brilla AI: AI Contestant for the National Science and Maths Quiz	Mar 4, 2024	MathQuestion Answering	CodeCode Available	1
HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics	Oct 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation	Dec 28, 2023	GSM8KLanguage Model Evaluation	CodeCode Available	1
Entropy-Based Adaptive Weighting for Self-Training	Mar 31, 2025	GSM8KMath	CodeCode Available	1

Show:10 25 50

← PrevPage 19 of 64Next →

No leaderboard results yet.