SOTAVerified

Math

Papers

Showing 101150 of 1596 papers

TitleStatusHype
ThoughtSource: A central hub for large language model reasoning dataCode3
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning TasksCode3
PAL: Program-aided Language ModelsCode3
SymForce: Symbolic Computation and Code Generation for RoboticsCode3
Training Verifiers to Solve Math Word ProblemsCode3
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement LearningCode2
OctoThinker: Mid-training Incentivizes Reinforcement Learning ScalingCode2
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningCode2
Essential-Web v1.0: 24T tokens of organized web dataCode2
TreeRL: LLM Reinforcement Learning with On-Policy Tree SearchCode2
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math ReasoningCode2
AbstentionBench: Reasoning LLMs Fail on Unanswerable QuestionsCode2
Play to Generalize: Learning to Reason Through Game PlayCode2
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought ReasoningCode2
The Surprising Effectiveness of Negative Reinforcement in LLM ReasoningCode2
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPOCode2
Reinforcing General Reasoning without VerifiersCode2
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token RoutingCode2
MAS-Zero: Designing Multi-Agent Systems with Zero SupervisionCode2
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement LearningCode2
RL Tango: Reinforcing Generator and Verifier Together for Language ReasoningCode2
Meta-Design Matters: A Self-Design Multi-Agent SystemCode2
AdaptThink: Reasoning Models Can Learn When to ThinkCode2
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent SpaceCode2
Synthetic Data RL: Task Definition Is All You NeedCode2
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning ModelsCode2
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem SolvingCode2
RM-R1: Reward Modeling as ReasoningCode2
Process Reward Models That ThinkCode2
Dynamic Early Exit in Reasoning ModelsCode2
Roll the dice & look before you leap: Going beyond the creative limits of next-token predictionCode2
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for ReasoningCode2
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement LearningCode2
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning IncentivizationCode2
Efficient Reinforcement Finetuning via Adaptive Curriculum LearningCode2
Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language ModelsCode2
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning ModelsCode2
MegaMath: Pushing the Limits of Open Math CorporaCode2
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies AheadCode2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
Learning to Reason for Long-Form Story GenerationCode2
Reasoning to Learn from Latent ThoughtsCode2
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning ModelsCode2
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks AutomationCode2
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language ModelsCode2
SIFT: Grounding LLM Reasoning in Contexts via StickersCode2
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement LearningCode2
On the Emergence of Thinking in LLMs I: Searching for the Right IntuitionCode2
Exploring the Limit of Outcome Reward for Learning Mathematical ReasoningCode2
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?Code2
Show:102550
← PrevPage 3 of 32Next →

No leaderboard results yet.