| EvoAgentX: An Automated Framework for Evolving Agentic Workflows | Jul 4, 2025 | Code GenerationMath | CodeCode Available | 7 |
| O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? | Nov 25, 2024 | HallucinationKnowledge Distillation | CodeCode Available | 7 |
| Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | Nov 4, 2024 | Logical ReasoningMathematical Problem-Solving | CodeCode Available | 5 |
| LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning | Oct 3, 2024 | Efficient ExplorationMathematical Problem-Solving | CodeCode Available | 5 |
| MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine | Jul 11, 2024 | Contrastive LearningLanguage Modelling | CodeCode Available | 4 |
| G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model | Dec 18, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Efficiently Serving LLM Reasoning Programs with Certaindex | Dec 30, 2024 | Code GenerationMathematical Problem-Solving | CodeCode Available | 3 |
| PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models | Mar 26, 2024 | Code CompletionFew-Shot Learning | CodeCode Available | 3 |
| ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving | Sep 29, 2023 | Arithmetic ReasoningComputational Efficiency | CodeCode Available | 3 |
| Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving | May 12, 2025 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation | Feb 26, 2025 | Code GenerationHumanEval | CodeCode Available | 2 |
| Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures | Feb 7, 2025 | Mathematical Problem-Solvingreinforcement-learning | CodeCode Available | 2 |
| MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | Jun 26, 2024 | BenchmarkingMath | CodeCode Available | 2 |
| Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models | Jun 25, 2024 | DiversityMath | CodeCode Available | 2 |
| DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | Jun 18, 2024 | Arithmetic ReasoningMath | CodeCode Available | 2 |
| ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | Apr 3, 2024 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| Measuring Mathematical Problem Solving With the MATH Dataset | Mar 5, 2021 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning | Jun 10, 2025 | Knowledge DistillationMath | CodeCode Available | 1 |
| Solving Inequality Proofs with Large Language Models | Jun 9, 2025 | Mathematical Problem-SolvingRelation Prediction | CodeCode Available | 1 |
| MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning | Jun 5, 2025 | Dataset GenerationMathematical Problem-Solving | CodeCode Available | 1 |
| RaDeR: Reasoning-aware Dense Retrieval Models | May 23, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| Entropy-Based Adaptive Weighting for Self-Training | Mar 31, 2025 | GSM8KMath | CodeCode Available | 1 |
| MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion | Mar 20, 2025 | Data AugmentationMathematical Problem-Solving | CodeCode Available | 1 |
| Forgotten Polygons: Multimodal Large Language Models are Shape-Blind | Feb 21, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities | Feb 17, 2025 | Code GenerationHumanEval | CodeCode Available | 1 |