| Meta-Design Matters: A Self-Design Multi-Agent System | May 21, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning | May 21, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| AdaptThink: Reasoning Models Can Learn When to Think | May 19, 2025 | Math | CodeCode Available | 2 |
| Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space | May 19, 2025 | GSM8KMath | CodeCode Available | 2 |
| Synthetic Data RL: Task Definition Is All You Need | May 18, 2025 | AllGSM8K | CodeCode Available | 2 |
| Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models | May 15, 2025 | Mathreinforcement-learning | CodeCode Available | 2 |
| Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving | May 12, 2025 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| RM-R1: Reward Modeling as Reasoning | May 5, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Process Reward Models That Think | Apr 23, 2025 | Math | CodeCode Available | 2 |
| Dynamic Early Exit in Reasoning Models | Apr 22, 2025 | GSM8KMath | CodeCode Available | 2 |