| SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? | Mar 16, 2025 | Board GamesCard Games | —Unverified | 0 | 0 |
| SplitReason: Learning To Offload Reasoning | Apr 23, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model | Jul 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| SSR: Speculative Parallel Scaling Reasoning in Test-time | May 21, 2025 | DiversityMath | —Unverified | 0 | 0 |
| Stable Code Technical Report | Apr 1, 2024 | Code CompletionLanguage Modelling | —Unverified | 0 | 0 |
| AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models | May 25, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| START: Self-taught Reasoner with Tools | Mar 6, 2025 | MathSelf-Learning | —Unverified | 0 | 0 |
| A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions | Dec 12, 2024 | GSM8KKnowledge Graphs | —Unverified | 0 | 0 |
| Steering LLM Reasoning Through Bias-Only Adaptation | May 24, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking | May 30, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |