| Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models | Jun 23, 2025 | Code CompletionGSM8K | —Unverified | 0 |
| ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs | Jun 23, 2025 | Math | CodeCode Available | 0 |
| Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Jun 23, 2025 | GPULarge Language Model | CodeCode Available | 2 |
| Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective | Jun 22, 2025 | In-Context LearningLarge Language Model | CodeCode Available | 1 |
| Shrinking the Generation-Verification Gap with Weak Verifiers | Jun 22, 2025 | Math | —Unverified | 0 |
| Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study | Jun 20, 2025 | Math | —Unverified | 0 |
| No Free Lunch: Rethinking Internal Feedback for LLM Reasoning | Jun 20, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| OJBench: A Competition Level Code Benchmark For Large Language Models | Jun 19, 2025 | Math | CodeCode Available | 1 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 |
| Utility-Driven Speculative Decoding for Mixture-of-Experts | Jun 17, 2025 | GPULarge Language Model | —Unverified | 0 |
| Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team | Jun 17, 2025 | Code GenerationGSM8K | CodeCode Available | 1 |
| Essential-Web v1.0: 24T tokens of organized web data | Jun 17, 2025 | Math | CodeCode Available | 2 |
| SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks | Jun 17, 2025 | MathSpatial Reasoning | —Unverified | 0 |
| AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy | Jun 16, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks | Jun 16, 2025 | FormMath | —Unverified | 0 |
| Steering LLM Thinking with Budget Guidance | Jun 16, 2025 | Math | CodeCode Available | 1 |
| Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models | Jun 16, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models | Jun 16, 2025 | Math | —Unverified | 0 |
| VGR: Visual Grounded Reasoning | Jun 13, 2025 | Large Language ModelMath | —Unverified | 0 |
| Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards | Jun 13, 2025 | MathNavigate | —Unverified | 0 |
| TreeRL: LLM Reinforcement Learning with On-Policy Tree Search | Jun 13, 2025 | Mathreinforcement-learning | CodeCode Available | 2 |
| Learning a Continue-Thinking Token for Enhanced Test-Time Scaling | Jun 12, 2025 | GSM8KMath | CodeCode Available | 0 |
| Spurious Rewards: Rethinking Training Signals in RLVR | Jun 12, 2025 | MathMathematical Reasoning | CodeCode Available | 3 |
| ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization | Jun 12, 2025 | Math | CodeCode Available | 0 |
| RePO: Replay-Enhanced Policy Optimization | Jun 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |