| Step-level Value Preference Optimization for Mathematical Reasoning | Jun 16, 2024 | Learning-To-RankMath | CodeCode Available | 3 |
| CLST: Cold-Start Mitigation in Knowledge Tracing by Aligning a Generative Language Model as a Students' Knowledge Tracer | Jun 13, 2024 | Domain GeneralizationKnowledge Tracing | —Unverified | 0 |
| MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning | Jun 13, 2024 | Instruction FollowingMath | CodeCode Available | 3 |
| Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback | Jun 13, 2024 | Instruction FollowingMath | CodeCode Available | 7 |
| Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models | Jun 13, 2024 | MathQuantization | CodeCode Available | 2 |
| Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models | Jun 13, 2024 | Mathobject-detection | CodeCode Available | 3 |
| ReMI: A Dataset for Reasoning with Multiple Images | Jun 13, 2024 | Chart UnderstandingMath | —Unverified | 0 |
| Collective Constitutional AI: Aligning a Language Model with Public Input | Jun 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B | Jun 11, 2024 | Decision MakingGSM8K | CodeCode Available | 5 |
| Can I understand what I create? Self-Knowledge Evaluation of Large Language Models | Jun 10, 2024 | Math | —Unverified | 0 |
| Human Learning about AI | Jun 8, 2024 | Math | —Unverified | 0 |
| CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning | Jun 7, 2024 | Instruction FollowingMath | CodeCode Available | 2 |
| A multi-core periphery perspective: Ranking via relative centrality | Jun 6, 2024 | Math | —Unverified | 0 |
| Lean Workbook: A large-scale Lean problem set formalized from natural language math problems | Jun 6, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 |
| DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning | Jun 6, 2024 | Math | CodeCode Available | 1 |
| Improve Mathematical Reasoning in Language Models by Automated Process Supervision | Jun 5, 2024 | GSM8KMath | —Unverified | 0 |
| NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models | Jun 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models | Jun 4, 2024 | Math | CodeCode Available | 0 |
| D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models | Jun 3, 2024 | GPUMath | —Unverified | 0 |
| Code Pretraining Improves Entity Tracking Abilities of Language Models | May 31, 2024 | Math | —Unverified | 0 |
| Cutting Through the Noise: Boosting LLM Performance on Math Word Problems | May 30, 2024 | 8kMath | CodeCode Available | 0 |
| Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation | May 30, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| TAIA: Large Language Models are Out-of-Distribution Data Learners | May 30, 2024 | Math | CodeCode Available | 1 |
| MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | May 29, 2024 | BenchmarkingDialogue Understanding | CodeCode Available | 1 |
| Yuan 2.0-M32: Mixture of Experts with Attention Router | May 28, 2024 | ARCMath | CodeCode Available | 2 |
| Arithmetic Reasoning with LLM: Prolog Generation & Permutation | May 28, 2024 | Arithmetic ReasoningData Augmentation | —Unverified | 0 |
| LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | May 27, 2024 | BenchmarkingGSM8K | CodeCode Available | 2 |
| Autoformalizing Euclidean Geometry | May 27, 2024 | Math | CodeCode Available | 2 |
| MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time | May 25, 2024 | GSM8KMath | —Unverified | 0 |
| Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models | May 24, 2024 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 2 |
| Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs | May 24, 2024 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Large Language Models Can Self-Correct with Key Condition Verification | May 23, 2024 | Arithmetic ReasoningMath | —Unverified | 0 |
| Can LLMs Solve longer Math Word Problems Better? | May 23, 2024 | Data AugmentationMath | CodeCode Available | 0 |
| JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models | May 23, 2024 | Knowledge DistillationMath | CodeCode Available | 1 |
| "Turing Tests" For An AI Scientist | May 22, 2024 | AI AgentData Compression | —Unverified | 0 |
| Investigating Symbolic Capabilities of Large Language Models | May 21, 2024 | MathNavigate | —Unverified | 0 |
| MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | May 20, 2024 | College MathematicsGSM8K | CodeCode Available | 2 |
| Multiple-Choice Questions are Efficient and Robust LLM Evaluators | May 20, 2024 | GSM8KHumanEval | CodeCode Available | 1 |
| Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving | May 20, 2024 | GSM8KMath | —Unverified | 0 |
| DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction | May 20, 2024 | DiagnosticMath | CodeCode Available | 0 |
| Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings | May 15, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A safety realignment framework via subspace-oriented model fusion for large language models | May 15, 2024 | Instruction FollowingMath | CodeCode Available | 0 |
| Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications | May 14, 2024 | GSM8KMath | —Unverified | 0 |
| TANQ: An open domain dataset of table answered questions | May 13, 2024 | MathOpen-Domain Question Answering | CodeCode Available | 1 |
| MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning | May 13, 2024 | Data AugmentationGSM8K | CodeCode Available | 3 |
| MathDivide: Improved mathematical reasoning by large language models | May 12, 2024 | GSM8KLogical Reasoning | —Unverified | 0 |
| Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions? | May 10, 2024 | Mathtext similarity | CodeCode Available | 0 |
| Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process | May 10, 2024 | Geometry Problem SolvingMachine Translation | CodeCode Available | 0 |
| Aligning Tutor Discourse Supporting Rigorous Thinking with Tutee Content Mastery for Predicting Math Achievement | May 10, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought | May 9, 2024 | HallucinationMath | —Unverified | 0 |