| MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Aug 14, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective | Oct 14, 2024 | Density Ratio EstimationGSM8K | CodeCode Available | 0 | 5 |
| Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Jun 5, 2025 | Arithmetic ReasoningMath | CodeCode Available | 0 | 5 |
| How Do Humans Write Code? Large Models Do It the Same Way Too | Feb 24, 2024 | Code GenerationMath | CodeCode Available | 0 | 5 |
| Analysing Mathematical Reasoning Abilities of Neural Models | Apr 2, 2019 | Mathematical Question AnsweringMathematical Reasoning | CodeCode Available | 0 | 5 |
| Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | Feb 12, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 | 5 |
| Hierarchical Attention Generates Better Proofs | Apr 27, 2025 | Automated Theorem ProvingMathematical Proofs | CodeCode Available | 0 | 5 |
| Adaptive Graph Pruning for Multi-Agent Communication | Jun 3, 2025 | Code GenerationLarge Language Model | CodeCode Available | 0 | 5 |
| MARGE: Improving Math Reasoning for LLMs with Guided Exploration | May 18, 2025 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |