| LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training | Nov 24, 2024 | MathMixture-of-Experts | CodeCode Available | 2 | 5 |
| Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization | Oct 11, 2024 | GSM8KLanguage Modeling | CodeCode Available | 2 | 5 |
| Balancing LoRA Performance and Efficiency with Simple Shard Sharing | Sep 19, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 2 | 5 |
| A Comparative Study on Reasoning Patterns of OpenAI's o1 Model | Oct 17, 2024 | Math | CodeCode Available | 2 | 5 |
| RM-R1: Reward Modeling as Reasoning | May 5, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 | 5 |
| Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models | May 15, 2025 | Mathreinforcement-learning | CodeCode Available | 2 | 5 |
| Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus | Nov 19, 2024 | Formal LogicLogical Reasoning | CodeCode Available | 2 | 5 |
| Efficient Reinforcement Finetuning via Adaptive Curriculum Learning | Apr 7, 2025 | MathMathematical Reasoning | CodeCode Available | 2 | 5 |
| LeanDojo: Theorem Proving with Retrieval-Augmented Language Models | Jun 27, 2023 | Automated Theorem ProvingGPU | CodeCode Available | 2 | 5 |
| Learning to Reason for Long-Form Story Generation | Mar 28, 2025 | FormMath | CodeCode Available | 2 | 5 |