| SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages | Jul 29, 2024 | DiversityInstruction Following | CodeCode Available | 2 |
| LoRA-Pro: Are Low-Rank Adapters Properly Optimized? | Jul 25, 2024 | Code GenerationComputational Efficiency | CodeCode Available | 2 |
| Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning | Jul 25, 2024 | Knowledge DistillationMathematical Reasoning | CodeCode Available | 2 |
| SOLO: A Single Transformer for Scalable Vision-Language Modeling | Jul 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? | Jul 1, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | Jun 26, 2024 | BenchmarkingMath | CodeCode Available | 2 |
| Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models | Jun 25, 2024 | DiversityMath | CodeCode Available | 2 |
| Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning | Jun 17, 2024 | Data AugmentationMathematical Reasoning | CodeCode Available | 2 |
| Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples | Jun 9, 2024 | ARCDiversity | CodeCode Available | 2 |
| Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models | May 24, 2024 | Atari GamesMathematical Reasoning | CodeCode Available | 2 |
| Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning | May 5, 2024 | GSM8KMath | CodeCode Available | 2 |
| Benchmarking Benchmark Leakage in Large Language Models | Apr 29, 2024 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 |
| Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards | Apr 16, 2024 | GSM8KMath | CodeCode Available | 2 |
| Compression Represents Intelligence Linearly | Apr 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Evaluating Mathematical Reasoning Beyond Accuracy | Apr 8, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers | Feb 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset | Feb 22, 2024 | DiversityMath | CodeCode Available | 2 |
| Reformatted Alignment | Feb 19, 2024 | GSM8KHallucination | CodeCode Available | 2 |
| When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | Feb 16, 2024 | Mathematical ReasoningRe-Ranking | CodeCode Available | 2 |
| Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts | Feb 12, 2024 | Continual PretrainingGSM8K | CodeCode Available | 2 |
| SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese | Jan 22, 2024 | DiversityGSM8K | CodeCode Available | 2 |
| LangBridge: Multilingual Reasoning Without Multilingual Supervision | Jan 19, 2024 | Code CompletionLogical Reasoning | CodeCode Available | 2 |
| SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models | Jan 15, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| MathPile: A Billion-Token-Scale Pretraining Corpus for Math | Dec 28, 2023 | Language IdentificationMath | CodeCode Available | 2 |
| An Expression Tree Decoding Strategy for Mathematical Equation Generation | Oct 14, 2023 | MathMathematical Reasoning | CodeCode Available | 2 |