| Play to Generalize: Learning to Reason Through Game Play | Jun 9, 2025 | Domain GeneralizationMath | CodeCode Available | 2 |
| WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning | Jun 9, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation | Jun 9, 2025 | GSM8KHumanEval | —Unverified | 0 |
| The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity | Jun 7, 2025 | Math | —Unverified | 0 |
| AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search | Jun 6, 2025 | Large Language ModelMath | CodeCode Available | 0 |
| SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms | Jun 6, 2025 | DiversityLarge Language Model | —Unverified | 0 |
| Spectral Derivatives | Jun 6, 2025 | Math | CodeCode Available | 0 |
| Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models | Jun 5, 2025 | AllMath | —Unverified | 0 |
| MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning | Jun 5, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| TreeRPO: Tree Relative Policy Optimization | Jun 5, 2025 | Math | CodeCode Available | 0 |