| Value-Guided Search for Efficient Chain-of-Thought Reasoning | May 23, 2025 | Math | CodeCode Available | 1 |
| Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning | May 23, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 1 |
| Outcome-based Reinforcement Learning to Predict the Future | May 23, 2025 | Holdout SetMath | —Unverified | 0 |
| The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs | May 23, 2025 | Cross-Lingual TransferMath | —Unverified | 0 |
| RaDeR: Reasoning-aware Dense Retrieval Models | May 23, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models | May 22, 2025 | Large Language ModelMath | CodeCode Available | 0 |
| AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning | May 22, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning | May 22, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Incremental Sequence Classification with Temporal Consistency | May 22, 2025 | ClassificationLanguage Modeling | —Unverified | 0 |
| Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning | May 22, 2025 | AttributeMath | —Unverified | 0 |