| SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese | Jan 22, 2024 | DiversityGSM8K | CodeCode Available | 2 |
| LangBridge: Multilingual Reasoning Without Multilingual Supervision | Jan 19, 2024 | Code CompletionLogical Reasoning | CodeCode Available | 2 |
| SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models | Jan 15, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| MathPile: A Billion-Token-Scale Pretraining Corpus for Math | Dec 28, 2023 | Language IdentificationMath | CodeCode Available | 2 |
| An Expression Tree Decoding Strategy for Mathematical Equation Generation | Oct 14, 2023 | MathMathematical Reasoning | CodeCode Available | 2 |
| MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning | Oct 9, 2023 | Arithmetic ReasoningData Augmentation | CodeCode Available | 2 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts | Oct 3, 2023 | ChatbotImage Captioning | CodeCode Available | 2 |
| CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets | Sep 29, 2023 | Language ModellingMathematical Reasoning | CodeCode Available | 2 |
| MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | Sep 21, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |