| Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic | Feb 19, 2024 | Instruction FollowingMath | CodeCode Available | 2 |
| Reformatted Alignment | Feb 19, 2024 | GSM8KHallucination | CodeCode Available | 2 |
| LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks | Feb 18, 2024 | Math | —Unverified | 0 |
| Orca-Math: Unlocking the potential of SLMs in Grade School Math | Feb 16, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Language Models as Science Tutors | Feb 16, 2024 | GSM8KMath | CodeCode Available | 1 |
| Language Models with Conformal Factuality Guarantees | Feb 15, 2024 | Conformal PredictionLanguage Modeling | —Unverified | 0 |
| Mathematical Opportunities in Digital Twins (MATH-DT) | Feb 15, 2024 | Math | —Unverified | 0 |
| OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | Feb 15, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 4 |
| GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving | Feb 15, 2024 | Geometry Problem SolvingMath | CodeCode Available | 1 |
| AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails | Feb 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications | Feb 14, 2024 | Math | —Unverified | 0 |
| MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data | Feb 14, 2024 | Automated Theorem ProvingLanguage Modelling | CodeCode Available | 1 |
| GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements | Feb 13, 2024 | GSM8KMath | —Unverified | 0 |
| EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages | Feb 12, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models | Feb 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts | Feb 12, 2024 | Continual PretrainingGSM8K | CodeCode Available | 2 |
| Understanding the Progression of Educational Topics via Semantic Matching | Feb 10, 2024 | Math | —Unverified | 0 |
| InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning | Feb 9, 2024 | Data AugmentationGSM8K | CodeCode Available | 4 |
| V-STaR: Training Verifiers for Self-Taught Reasoners | Feb 9, 2024 | Code GenerationMath | —Unverified | 0 |
| Noise Contrastive Alignment of Language Models with Explicit Rewards | Feb 8, 2024 | Language ModellingMath | CodeCode Available | 3 |
| In-Context Principle Learning from Mistakes | Feb 8, 2024 | GSM8KIn-Context Learning | CodeCode Available | 0 |
| Self-Discover: Large Language Models Self-Compose Reasoning Structures | Feb 6, 2024 | Math | CodeCode Available | 3 |
| RevOrder: A Novel Method for Enhanced Arithmetic in Language Models | Feb 6, 2024 | GSM8KMath | —Unverified | 0 |
| Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation | Feb 5, 2024 | Knowledge GraphsMath | CodeCode Available | 1 |
| DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | Feb 5, 2024 | Arithmetic ReasoningMath | CodeCode Available | 9 |
| Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision | Feb 5, 2024 | GSM8KMath | —Unverified | 0 |
| Improving Assessment of Tutoring Practices using Retrieval-Augmented Generation | Feb 4, 2024 | HallucinationMath | —Unverified | 0 |
| MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models | Feb 2, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| Salsa Fresca: Angular Embeddings and Pre-Training for ML Attacks on Learning With Errors | Feb 2, 2024 | Math | —Unverified | 0 |
| Large Language Models for Mathematical Reasoning: Progresses and Challenges | Jan 31, 2024 | DiversityMath | —Unverified | 0 |
| Efficient Tool Use with Chain-of-Abstraction Reasoning | Jan 30, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Taxonomy of Mathematical Plagiarism | Jan 30, 2024 | MathQuestion Answering | CodeCode Available | 0 |
| ReGAL: Refactoring Programs to Discover Generalizable Abstractions | Jan 29, 2024 | Date UnderstandingMath | CodeCode Available | 1 |
| GAPS: Geometry-Aware Problem Solver | Jan 29, 2024 | Geometry Problem SolvingMath | —Unverified | 0 |
| YODA: Teacher-Student Progressive Learning for Language Models | Jan 28, 2024 | GSM8KMath | —Unverified | 0 |
| Exploring Educational Equity: A Machine Learning Approach to Unravel Achievement Disparities in Georgia | Jan 25, 2024 | Math | —Unverified | 0 |
| Can AI Assistants Know What They Don't Know? | Jan 24, 2024 | MathOpen-Domain Question Answering | CodeCode Available | 2 |
| TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks | Jan 23, 2024 | MathQuestion Answering | CodeCode Available | 1 |
| Using Java Geometry Expert as Guide in the Preparations for Math Contests | Jan 22, 2024 | Math | —Unverified | 0 |
| SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese | Jan 22, 2024 | DiversityGSM8K | CodeCode Available | 2 |
| Over-Reasoning and Redundant Calculation of Large Language Models | Jan 21, 2024 | GSM8KMath | CodeCode Available | 1 |
| Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning | Jan 19, 2024 | GSM8KMath | CodeCode Available | 1 |
| Augmenting Math Word Problems via Iterative Question Composing | Jan 17, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Large Language Models Are Neurosymbolic Reasoners | Jan 17, 2024 | Common Sense ReasoningMath | CodeCode Available | 1 |
| ReFT: Reasoning with Reinforced Fine-Tuning | Jan 17, 2024 | GSM8KMath | CodeCode Available | 4 |
| Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions | Jan 17, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Tuning Language Models by Proxy | Jan 16, 2024 | Domain AdaptationMath | CodeCode Available | 2 |
| Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination | Jan 16, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline | Jan 16, 2024 | GSM8KMath | CodeCode Available | 3 |
| SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models | Jan 15, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |