| Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate | Jan 29, 2025 | Instruction FollowingMath | CodeCode Available | 2 |
| O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning | Jan 22, 2025 | Mathematical Reasoning | CodeCode Available | 2 |
| Efficient Reinforcement Finetuning via Adaptive Curriculum Learning | Apr 7, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks | Mar 27, 2025 | Imitation LearningMathematical Reasoning | CodeCode Available | 2 |
| Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement | Oct 6, 2024 | Mathematical ReasoningMeta-Learning | CodeCode Available | 2 |
| LeanAgent: Lifelong Learning for Formal Theorem Proving | Oct 8, 2024 | Abstract AlgebraAutomated Theorem Proving | CodeCode Available | 2 |
| CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models | Sep 4, 2024 | GSM8KMath | CodeCode Available | 2 |
| FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios | Jul 25, 2023 | Code GenerationFact Checking | CodeCode Available | 2 |
| MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning | Jun 5, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Multi-View Reasoning: Consistent Contrastive Learning for Math Word Problem | Oct 21, 2022 | Contrastive LearningMath | CodeCode Available | 2 |
| Optimizing Anytime Reasoning via Budget Relative Policy Optimization | May 19, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 2 |
| Reformatted Alignment | Feb 19, 2024 | GSM8KHallucination | CodeCode Available | 2 |
| Compression Represents Intelligence Linearly | Apr 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning | Feb 10, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning | Jul 25, 2024 | Knowledge DistillationMathematical Reasoning | CodeCode Available | 2 |
| Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Jun 23, 2025 | GPULarge Language Model | CodeCode Available | 2 |
| Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT? | Apr 16, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning | Aug 10, 2022 | MathMathematical Reasoning | CodeCode Available | 1 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models | May 14, 2025 | DiversityMathematical Reasoning | CodeCode Available | 1 |
| Ada-Instruct: Adapting Instruction Generators for Complex Reasoning | Oct 6, 2023 | Code CompletionIn-Context Learning | CodeCode Available | 1 |
| Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning | Aug 16, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion | Mar 20, 2025 | Data AugmentationMathematical Problem-Solving | CodeCode Available | 1 |
| Mathematical Capabilities of ChatGPT | Jan 31, 2023 | Elementary MathematicsMath | CodeCode Available | 1 |
| Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models | Mar 4, 2024 | Data AugmentationGSM8K | CodeCode Available | 1 |