| MegaMath: Pushing the Limits of Open Math Corpora | Apr 3, 2025 | DiversityMath | CodeCode Available | 2 |
| Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks | Mar 27, 2025 | Imitation LearningMathematical Reasoning | CodeCode Available | 2 |
| Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models? | Mar 8, 2025 | Mathematical ReasoningMultimodal Reasoning | CodeCode Available | 2 |
| RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs | Mar 8, 2025 | Instruction FollowingMathematical Reasoning | CodeCode Available | 2 |
| Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning | Feb 10, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | Feb 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs | Feb 4, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 2 |
| Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate | Jan 29, 2025 | Instruction FollowingMath | CodeCode Available | 2 |
| O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning | Jan 22, 2025 | Mathematical Reasoning | CodeCode Available | 2 |
| URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics | Jan 8, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners | Dec 23, 2024 | Mathematical Reasoning | CodeCode Available | 2 |
| Offline Reinforcement Learning for LLM Multi-Step Reasoning | Dec 20, 2024 | GSM8KMath | CodeCode Available | 2 |
| ProcessBench: Identifying Process Errors in Mathematical Reasoning | Dec 9, 2024 | GSM8KMath | CodeCode Available | 2 |
| TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action | Dec 7, 2024 | Depth EstimationMathematical Reasoning | CodeCode Available | 2 |
| Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning | Nov 29, 2024 | Mathematical Reasoning | CodeCode Available | 2 |
| Preference Optimization for Reasoning with Pseudo Feedback | Nov 25, 2024 | GSM8KMath | CodeCode Available | 2 |
| Enhancing LLM Reasoning with Reward-guided Tree Search | Nov 18, 2024 | Mathematical Reasoning | CodeCode Available | 2 |
| AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning | Nov 18, 2024 | Mathematical Reasoning | CodeCode Available | 2 |
| Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch | Oct 24, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code | Oct 10, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models | Oct 10, 2024 | GSM8KMath | CodeCode Available | 2 |
| LeanAgent: Lifelong Learning for Formal Theorem Proving | Oct 8, 2024 | Abstract AlgebraAutomated Theorem Proving | CodeCode Available | 2 |
| Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement | Oct 6, 2024 | Mathematical ReasoningMeta-Learning | CodeCode Available | 2 |
| CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models | Sep 4, 2024 | GSM8KMath | CodeCode Available | 2 |
| Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process | Jul 29, 2024 | GSM8KMath | CodeCode Available | 2 |