SOTAVerified

Math

Papers

Showing 151200 of 1596 papers

TitleStatusHype
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning ProcessCode2
Progressive-Hint Prompting Improves Reasoning in Large Language ModelsCode2
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks AutomationCode2
Exploring the Limit of Outcome Reward for Learning Mathematical ReasoningCode2
Autoformalizing Euclidean GeometryCode2
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique PipelineCode2
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal ModelsCode2
MM-Vet: Evaluating Large Multimodal Models for Integrated CapabilitiesCode2
Multi-View Reasoning: Consistent Contrastive Learning for Math Word ProblemCode2
Evaluating Mathematical Reasoning Beyond AccuracyCode2
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision ModelsCode2
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought ReasoningCode2
Meta-Design Matters: A Self-Design Multi-Agent SystemCode2
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuningCode2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsCode2
MegaMath: Pushing the Limits of Open Math CorporaCode2
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning ModelsCode2
Memorizing TransformersCode2
Meta Prompting for AI SystemsCode2
OctoThinker: Mid-training Incentivizes Reinforcement Learning ScalingCode2
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning ModelsCode2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language ModelsCode2
Exploring the Compositional Deficiency of Large Language Models in Mathematical ReasoningCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
Can AI Assistants Know What They Don't Know?Code2
Adaptable Logical Control for Large Language ModelsCode2
Offline Reinforcement Learning for LLM Multi-Step ReasoningCode2
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical TextsCode2
Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic CorpusCode2
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual ContextsCode2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics BenchmarkCode2
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical CodeCode2
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function OptimizationCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
MAmmoTH: Building Math Generalist Models through Hybrid Instruction TuningCode2
Full Page Handwriting Recognition via Image to Sequence ExtractionCode2
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical ProblemsCode2
A Survey of Deep Learning for Mathematical ReasoningCode2
Efficient Reinforcement Finetuning via Adaptive Curriculum LearningCode2
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningCode2
Agent Lumos: Unified and Modular Training for Open-Source Language AgentsCode2
Essential-Web v1.0: 24T tokens of organized web dataCode2
Balancing LoRA Performance and Efficiency with Simple Shard SharingCode2
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of ParametersCode2
MAS-Zero: Designing Multi-Agent Systems with Zero SupervisionCode2
Cumulative Reasoning with Large Language ModelsCode2
Measuring Mathematical Problem Solving With the MATH DatasetCode2
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to ImitateCode2
Learning to Reason for Long-Form Story GenerationCode2
Dynamic Early Exit in Reasoning ModelsCode2
Show:102550
← PrevPage 4 of 32Next →

No leaderboard results yet.