| Activation Steering for Chain-of-Thought Compression | Jul 7, 2025 | GSM8KMath | CodeCode Available | 0 |
| Effects of structure on reasoning in instance-level Self-Discover | Jul 4, 2025 | Math | CodeCode Available | 0 |
| Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model | Jun 30, 2025 | Math | —Unverified | 0 |
| Bridging Offline and Online Reinforcement Learning for LLMs | Jun 26, 2025 | Instruction FollowingMath | —Unverified | 0 |
| Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test | Jun 26, 2025 | Code GenerationLarge Language Model | —Unverified | 0 |
| AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control | Jun 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Multi-lingual Functional Evaluation for Large Language Models | Jun 25, 2025 | BelebeleInstruction Following | —Unverified | 0 |
| When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs | Jun 25, 2025 | Math | —Unverified | 0 |
| ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs | Jun 23, 2025 | Math | —Unverified | 0 |
| Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models | Jun 23, 2025 | Code CompletionGSM8K | —Unverified | 0 |