| LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! | Feb 11, 2025 | Large Language ModelMath | CodeCode Available | 7 |
| Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving | Feb 11, 2025 | Automated Theorem ProvingLarge Language Model | CodeCode Available | 3 |
| O1 Embedder: Let Retrievers Think Before Action | Feb 11, 2025 | Contrastive LearningMath | —Unverified | 0 |
| CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 4 |
| Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 0 |
| Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling | Feb 10, 2025 | Math | CodeCode Available | 3 |
| MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations | Feb 10, 2025 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning | Feb 10, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates | Feb 10, 2025 | Hierarchical Reinforcement LearningLanguage Modeling | CodeCode Available | 4 |
| On the Emergence of Thinking in LLMs I: Searching for the Right Intuition | Feb 10, 2025 | Math | CodeCode Available | 2 |