| Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation | Feb 26, 2025 | Code GenerationHumanEval | CodeCode Available | 2 |
| Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures | Feb 7, 2025 | Mathematical Problem-Solvingreinforcement-learning | CodeCode Available | 2 |
| MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | Jun 26, 2024 | BenchmarkingMath | CodeCode Available | 2 |
| Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models | Jun 25, 2024 | DiversityMath | CodeCode Available | 2 |
| DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | Jun 18, 2024 | Arithmetic ReasoningMath | CodeCode Available | 2 |
| ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | Apr 3, 2024 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| Measuring Mathematical Problem Solving With the MATH Dataset | Mar 5, 2021 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning | Jun 10, 2025 | Knowledge DistillationMath | CodeCode Available | 1 |
| Solving Inequality Proofs with Large Language Models | Jun 9, 2025 | Mathematical Problem-SolvingRelation Prediction | CodeCode Available | 1 |
| MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning | Jun 5, 2025 | Dataset GenerationMathematical Problem-Solving | CodeCode Available | 1 |