| Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process | Jul 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| Weak-to-Strong Reasoning | Jul 18, 2024 | GSM8KMath | CodeCode Available | 2 |
| We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? | Jul 1, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | Jun 26, 2024 | BenchmarkingMath | CodeCode Available | 2 |
| Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models | Jun 25, 2024 | DiversityMath | CodeCode Available | 2 |
| Adaptable Logical Control for Large Language Models | Jun 19, 2024 | MathText Generation | CodeCode Available | 2 |
| DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | Jun 18, 2024 | Arithmetic ReasoningMath | CodeCode Available | 2 |
| Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models | Jun 13, 2024 | MathQuantization | CodeCode Available | 2 |
| CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning | Jun 7, 2024 | Instruction FollowingMath | CodeCode Available | 2 |
| Yuan 2.0-M32: Mixture of Experts with Attention Router | May 28, 2024 | ARCMath | CodeCode Available | 2 |
| LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | May 27, 2024 | BenchmarkingGSM8K | CodeCode Available | 2 |
| Autoformalizing Euclidean Geometry | May 27, 2024 | Math | CodeCode Available | 2 |
| Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models | May 24, 2024 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 2 |
| MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | May 20, 2024 | College MathematicsGSM8K | CodeCode Available | 2 |
| Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning | May 5, 2024 | GSM8KMath | CodeCode Available | 2 |
| Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards | Apr 16, 2024 | GSM8KMath | CodeCode Available | 2 |
| Evaluating Mathematical Reasoning Beyond Accuracy | Apr 8, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems | Apr 6, 2024 | Logical ReasoningMath | CodeCode Available | 2 |
| ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | Apr 3, 2024 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision | Mar 14, 2024 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap | Feb 29, 2024 | Math | CodeCode Available | 2 |
| GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers | Feb 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset | Feb 22, 2024 | DiversityMath | CodeCode Available | 2 |
| Reformatted Alignment | Feb 19, 2024 | GSM8KHallucination | CodeCode Available | 2 |
| Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic | Feb 19, 2024 | Instruction FollowingMath | CodeCode Available | 2 |