| How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study | May 21, 2025 | Math | CodeCode Available | 0 |
| MAPS: A Multilingual Benchmark for Global Agent Performance and Security | May 21, 2025 | Code GenerationMath | —Unverified | 0 |
| Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision | May 21, 2025 | GSM8KLearning-To-Rank | —Unverified | 0 |
| SSR: Speculative Parallel Scaling Reasoning in Test-time | May 21, 2025 | DiversityMath | —Unverified | 0 |
| Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities | May 21, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| The Hallucination Tax of Reinforcement Finetuning | May 20, 2025 | HallucinationMath | —Unverified | 0 |
| EasyMath: A 0-shot Math Benchmark for SLMs | May 20, 2025 | Math | —Unverified | 0 |
| RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning | May 20, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning | May 20, 2025 | MathOffline RL | —Unverified | 0 |
| Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings | May 19, 2025 | HumanEvalMath | CodeCode Available | 0 |