SOTAVerified

Math

Papers

Showing 226250 of 1596 papers

TitleStatusHype
Specializing Smaller Language Models towards Multi-Step ReasoningCode2
A Survey of Deep Learning for Mathematical ReasoningCode2
Multi-View Reasoning: Consistent Contrastive Learning for Math Word ProblemCode2
Language Models are Multilingual Chain-of-Thought ReasonersCode2
PaLM: Scaling Language Modeling with PathwaysCode2
Memorizing TransformersCode2
Accelerating Sparse Deep Neural NetworksCode2
Full Page Handwriting Recognition via Image to Sequence ExtractionCode2
Measuring Mathematical Problem Solving With the MATH DatasetCode2
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationCode1
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement LearningCode1
The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong GainsCode1
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
OJBench: A Competition Level Code Benchmark For Large Language ModelsCode1
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad TeamCode1
Steering LLM Thinking with Budget GuidanceCode1
RePO: Replay-Enhanced Policy OptimizationCode1
Resa: Transparent Reasoning Models via SAEsCode1
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMsCode1
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM ReasoningCode1
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement LearningCode1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent FrameworkCode1
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic TasksCode1
Show:102550
← PrevPage 10 of 64Next →

No leaderboard results yet.