SOTAVerified

Math

Papers

Showing 201225 of 1596 papers

TitleStatusHype
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought ReasoningCode2
A Survey of Deep Learning for Mathematical ReasoningCode2
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language ModelsCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language ModelsCode2
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning ModelsCode2
Cumulative Reasoning with Large Language ModelsCode2
Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained RewardsCode2
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to ImitateCode2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-SolvingCode2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics BenchmarkCode2
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical CodeCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical ProblemsCode2
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuningCode2
MAmmoTH: Building Math Generalist Models through Hybrid Instruction TuningCode2
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of ParametersCode2
Agent Lumos: Unified and Modular Training for Open-Source Language AgentsCode2
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningCode2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
A Comparative Study on Reasoning Patterns of OpenAI's o1 ModelCode2
Balancing LoRA Performance and Efficiency with Simple Shard SharingCode2
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal ModelsCode2
Archon: An Architecture Search Framework for Inference-Time TechniquesCode2
AbstentionBench: Reasoning LLMs Fail on Unanswerable QuestionsCode2
Show:102550
← PrevPage 9 of 64Next →

No leaderboard results yet.