| CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs | Jul 8, 2025 | GSM8KMath | —Unverified | 0 |
| Activation Steering for Chain-of-Thought Compression | Jul 7, 2025 | GSM8KMath | CodeCode Available | 0 |
| LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models | Jul 5, 2025 | BenchmarkingGPU | CodeCode Available | 1 |
| EvoAgentX: An Automated Framework for Evolving Agentic Workflows | Jul 4, 2025 | Code GenerationMath | CodeCode Available | 7 |
| Effects of structure on reasoning in instance-level Self-Discover | Jul 4, 2025 | Math | CodeCode Available | 0 |
| Energy-Based Transformers are Scalable Learners and Thinkers | Jul 2, 2025 | DenoisingImage Denoising | CodeCode Available | 4 |
| Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model | Jun 30, 2025 | Math | —Unverified | 0 |
| SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning | Jun 30, 2025 | MathMulti-agent Reinforcement Learning | CodeCode Available | 2 |
| Bridging Offline and Online Reinforcement Learning for LLMs | Jun 26, 2025 | Instruction FollowingMath | —Unverified | 0 |
| Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test | Jun 26, 2025 | Code GenerationLarge Language Model | —Unverified | 0 |