| Steering LLM Thinking with Budget Guidance | Jun 16, 2025 | Math | CodeCode Available | 1 |
| Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models | Jun 16, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models | Jun 16, 2025 | Math | —Unverified | 0 |
| VGR: Visual Grounded Reasoning | Jun 13, 2025 | Large Language ModelMath | —Unverified | 0 |
| Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards | Jun 13, 2025 | MathNavigate | —Unverified | 0 |
| TreeRL: LLM Reinforcement Learning with On-Policy Tree Search | Jun 13, 2025 | Mathreinforcement-learning | CodeCode Available | 2 |
| Learning a Continue-Thinking Token for Enhanced Test-Time Scaling | Jun 12, 2025 | GSM8KMath | CodeCode Available | 0 |
| Spurious Rewards: Rethinking Training Signals in RLVR | Jun 12, 2025 | MathMathematical Reasoning | CodeCode Available | 3 |
| ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization | Jun 12, 2025 | Math | CodeCode Available | 0 |
| RePO: Replay-Enhanced Policy Optimization | Jun 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |