SOTAVerified

Math

Papers

Showing 101125 of 1596 papers

TitleStatusHype
Thinkless: LLM Learns When to ThinkCode3
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical ReasoningCode3
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM FinetuningCode3
Noise Contrastive Alignment of Language Models with Explicit RewardsCode3
Dynamic Early Exit in Reasoning ModelsCode2
Memorizing TransformersCode2
Measuring Multimodal Mathematical Reasoning with MATH-Vision DatasetCode2
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual ContextsCode2
MegaMath: Pushing the Limits of Open Math CorporaCode2
Meta-Design Matters: A Self-Design Multi-Agent SystemCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics BenchmarkCode2
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical CodeCode2
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical ProblemsCode2
MAmmoTH: Building Math Generalist Models through Hybrid Instruction TuningCode2
Can AI Assistants Know What They Don't Know?Code2
MAS-Zero: Designing Multi-Agent Systems with Zero SupervisionCode2
AGIEval: A Human-Centric Benchmark for Evaluating Foundation ModelsCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language ModelsCode2
Measuring Mathematical Problem Solving With the MATH DatasetCode2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsCode2
Balancing LoRA Performance and Efficiency with Simple Shard SharingCode2
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem SolvingCode2
Show:102550
← PrevPage 5 of 64Next →

No leaderboard results yet.