| Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study | Jun 20, 2025 | Math | —Unverified | 0 |
| No Free Lunch: Rethinking Internal Feedback for LLM Reasoning | Jun 20, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| OJBench: A Competition Level Code Benchmark For Large Language Models | Jun 19, 2025 | Math | CodeCode Available | 1 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 |
| Utility-Driven Speculative Decoding for Mixture-of-Experts | Jun 17, 2025 | GPULarge Language Model | —Unverified | 0 |
| Essential-Web v1.0: 24T tokens of organized web data | Jun 17, 2025 | Math | CodeCode Available | 2 |
| SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks | Jun 17, 2025 | MathSpatial Reasoning | —Unverified | 0 |
| Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team | Jun 17, 2025 | Code GenerationGSM8K | CodeCode Available | 1 |
| AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy | Jun 16, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks | Jun 16, 2025 | FormMath | —Unverified | 0 |