| Discriminative Policy Optimization for Token-Level Reward Models | May 29, 2025 | GSM8KLanguage Modeling | CodeCode Available | 0 |
| DINGO: Constrained Inference for Diffusion LLMs | May 29, 2025 | Math | —Unverified | 0 |
| LLM Performance for Code Generation on Noisy Tasks | May 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| Decomposing Elements of Problem Solving: What "Math" Does RL Teach? | May 28, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark | May 28, 2025 | Math | CodeCode Available | 0 |
| Maximizing Confidence Alone Improves Reasoning | May 28, 2025 | GSM8KMath | —Unverified | 0 |
| Skywork Open Reasoner 1 Technical Report | May 28, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 4 |
| Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start | May 28, 2025 | MathMultimodal Reasoning | CodeCode Available | 1 |
| Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO | May 28, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge | May 28, 2025 | Imitation LearningMath | CodeCode Available | 1 |
| Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning | May 27, 2025 | Math | —Unverified | 0 |
| Reinforcing General Reasoning without Verifiers | May 27, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning | May 27, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing | May 27, 2025 | Math | CodeCode Available | 2 |
| MAS-Zero: Designing Multi-Agent Systems with Zero Supervision | May 26, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions | May 26, 2025 | AttributeMath | —Unverified | 0 |
| Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging | May 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition | May 26, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models | May 26, 2025 | Contrastive LearningMath | CodeCode Available | 0 |
| Faster and Better LLMs via Latency-Aware Test-Time Scaling | May 26, 2025 | Math | —Unverified | 0 |
| The Role of Diversity in In-Context Learning for Large Language Models | May 26, 2025 | DiversityIn-Context Learning | —Unverified | 0 |
| Interleaved Reasoning for Large Language Models via Reinforcement Learning | May 26, 2025 | Logical ReasoningMath | —Unverified | 0 |
| Improving Multilingual Math Reasoning for African Languages | May 26, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning | May 26, 2025 | DiversityMath | —Unverified | 0 |
| Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision | May 26, 2025 | HallucinationMath | CodeCode Available | 0 |
| Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles | May 26, 2025 | ARCLogical Reasoning | —Unverified | 0 |
| Inference-time Alignment in Continuous Space | May 26, 2025 | Math | CodeCode Available | 0 |
| AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models | May 25, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| MMATH: A Multilingual Benchmark for Mathematical Reasoning | May 25, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Steering LLM Reasoning Through Bias-Only Adaptation | May 24, 2025 | GSM8KMath | —Unverified | 0 |
| Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions | May 24, 2025 | Automated Theorem ProvingMath | CodeCode Available | 0 |
| Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment? | May 24, 2025 | Code GenerationMath | —Unverified | 0 |
| MSA at BEA 2025 Shared Task: Disagreement-Aware Instruction Tuning for Multi-Dimensional Evaluation of LLMs as Math Tutors | May 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization | May 24, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Anchored Diffusion Language Model | May 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark | May 24, 2025 | Math | CodeCode Available | 0 |
| More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models | May 23, 2025 | DiagnosticHallucination | —Unverified | 0 |
| Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem Solving | May 23, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| VideoGameBench: Can Vision-Language Models complete popular video games? | May 23, 2025 | Math | —Unverified | 0 |
| One RL to See Them All: Visual Triple Unified Reinforcement Learning | May 23, 2025 | AllMath | —Unverified | 0 |
| Value-Guided Search for Efficient Chain-of-Thought Reasoning | May 23, 2025 | Math | CodeCode Available | 1 |
| Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning | May 23, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 1 |
| Outcome-based Reinforcement Learning to Predict the Future | May 23, 2025 | Holdout SetMath | —Unverified | 0 |
| The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs | May 23, 2025 | Cross-Lingual TransferMath | —Unverified | 0 |
| RaDeR: Reasoning-aware Dense Retrieval Models | May 23, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models | May 22, 2025 | Large Language ModelMath | CodeCode Available | 0 |
| AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning | May 22, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning | May 22, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Incremental Sequence Classification with Temporal Consistency | May 22, 2025 | ClassificationLanguage Modeling | —Unverified | 0 |
| Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning | May 22, 2025 | AttributeMath | —Unverified | 0 |