| Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning | Oct 18, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback | Jan 18, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs | May 19, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Subtle Errors Matter: Preference Learning via Error-injected Self-editing | Oct 9, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Supervised Optimism Correction: Be Confident When LLMs Are Sure | Apr 10, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Sustainability of Collusion and Market Transparency in a Sequential Search Market: a Generalization | May 5, 2021 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models | Feb 20, 2024 | Instruction FollowingLogical Reasoning | —Unverified | 0 | 0 |
| Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use | Apr 7, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| System-2 Mathematical Reasoning via Enriched Instruction Tuning | Dec 22, 2024 | ERPGSM8K | —Unverified | 0 | 0 |
| Table as Thought: Exploring Structured Thoughts in LLM Reasoning | Jan 4, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Taming Generative Diffusion Prior for Universal Blind Image Restoration | Aug 21, 2024 | Image RestorationMathematical Reasoning | —Unverified | 0 | 0 |
| Tangram: Benchmark for Evaluating Geometric Element Recognition in Large Multimodal Models | Aug 25, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving | Feb 17, 2025 | MathMathematical Problem-Solving | —Unverified | 0 | 0 |
| TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving | Jun 12, 2025 | Logical ReasoningMathematical Problem-Solving | —Unverified | 0 | 0 |
| Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic | Jun 9, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Test-time Scaling Techniques in Theoretical Physics -- A Comparison of Methods on the TPBench Dataset | Jun 25, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Text Generation Beyond Discrete Token Sampling | May 20, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 | 0 |
| The Axiom-Based Atlas: A Structural Mapping of Theorems via Foundational Proof Vectors | Mar 31, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| The Karp Dataset | Jan 24, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 | 0 |
| The Lessons of Developing Process Reward Models in Mathematical Reasoning | Jan 13, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Theorem Prover as a Judge for Synthetic Data Generation | Feb 18, 2025 | Mathematical ProofsMathematical Reasoning | —Unverified | 0 | 0 |
| Theoretical Analysis of an XGBoost Framework for Product Cannibalization | Dec 2, 2021 | Mathematical Reasoning | —Unverified | 0 | 0 |
| The Qiyas Benchmark: Measuring ChatGPT Mathematical and Language Understanding in Arabic | Jun 28, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| The Role of General Intelligence in Mathematical Reasoning | Apr 27, 2021 | Mathematical Reasoning | —Unverified | 0 | 0 |
| The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs | May 23, 2025 | Cross-Lingual TransferMath | —Unverified | 0 | 0 |