| Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively | May 31, 2025 | Decision MakingMathematical Reasoning | CodeCode Available | 0 |
| Evaluation of LLMs for mathematical problem solving | May 30, 2025 | GSM8KMathematical Problem-Solving | —Unverified | 0 |
| Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning | May 30, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation | May 30, 2025 | Code GenerationDiversity | CodeCode Available | 0 |
| Unifying Language Agent Algorithms with Graph-based Orchestration Engine for Reproducible Agent Research | May 30, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration | May 30, 2025 | Mathematical Reasoning | CodeCode Available | 5 |
| The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models | May 30, 2025 | HallucinationMathematical Reasoning | CodeCode Available | 1 |
| Towards Effective Code-Integrated Reasoning | May 30, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 1 |
| Scaling up the think-aloud method | May 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Diversity-Aware Policy Optimization for Large Language Model Reasoning | May 29, 2025 | DiversityLanguage Modeling | —Unverified | 0 |