| Arithmetic Reasoning with LLM: Prolog Generation & Permutation | May 28, 2024 | Arithmetic ReasoningData Augmentation | —Unverified | 0 |
| LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | May 27, 2024 | BenchmarkingGSM8K | CodeCode Available | 2 |
| Autoformalizing Euclidean Geometry | May 27, 2024 | Math | CodeCode Available | 2 |
| MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time | May 25, 2024 | GSM8KMath | —Unverified | 0 |
| Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models | May 24, 2024 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 2 |
| Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs | May 24, 2024 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Large Language Models Can Self-Correct with Key Condition Verification | May 23, 2024 | Arithmetic ReasoningMath | —Unverified | 0 |
| Can LLMs Solve longer Math Word Problems Better? | May 23, 2024 | Data AugmentationMath | CodeCode Available | 0 |
| JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models | May 23, 2024 | Knowledge DistillationMath | CodeCode Available | 1 |
| "Turing Tests" For An AI Scientist | May 22, 2024 | AI AgentData Compression | —Unverified | 0 |
| Investigating Symbolic Capabilities of Large Language Models | May 21, 2024 | MathNavigate | —Unverified | 0 |
| MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | May 20, 2024 | College MathematicsGSM8K | CodeCode Available | 2 |
| Multiple-Choice Questions are Efficient and Robust LLM Evaluators | May 20, 2024 | GSM8KHumanEval | CodeCode Available | 1 |
| Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving | May 20, 2024 | GSM8KMath | —Unverified | 0 |
| DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction | May 20, 2024 | DiagnosticMath | CodeCode Available | 0 |
| Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings | May 15, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A safety realignment framework via subspace-oriented model fusion for large language models | May 15, 2024 | Instruction FollowingMath | CodeCode Available | 0 |
| Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications | May 14, 2024 | GSM8KMath | —Unverified | 0 |
| TANQ: An open domain dataset of table answered questions | May 13, 2024 | MathOpen-Domain Question Answering | CodeCode Available | 1 |
| MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning | May 13, 2024 | Data AugmentationGSM8K | CodeCode Available | 3 |
| MathDivide: Improved mathematical reasoning by large language models | May 12, 2024 | GSM8KLogical Reasoning | —Unverified | 0 |
| Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions? | May 10, 2024 | Mathtext similarity | CodeCode Available | 0 |
| Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process | May 10, 2024 | Geometry Problem SolvingMachine Translation | CodeCode Available | 0 |
| Aligning Tutor Discourse Supporting Rigorous Thinking with Tutee Content Mastery for Predicting Math Achievement | May 10, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought | May 9, 2024 | HallucinationMath | —Unverified | 0 |