| NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models | Jun 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Improve Mathematical Reasoning in Language Models by Automated Process Supervision | Jun 5, 2024 | GSM8KMath | —Unverified | 0 |
| mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models | Jun 4, 2024 | Math | CodeCode Available | 0 |
| D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models | Jun 3, 2024 | GPUMath | —Unverified | 0 |
| Code Pretraining Improves Entity Tracking Abilities of Language Models | May 31, 2024 | Math | —Unverified | 0 |
| Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation | May 30, 2024 | Code GenerationHumanEval | —Unverified | 0 |
| Cutting Through the Noise: Boosting LLM Performance on Math Word Problems | May 30, 2024 | 8kMath | CodeCode Available | 0 |
| Arithmetic Reasoning with LLM: Prolog Generation & Permutation | May 28, 2024 | Arithmetic ReasoningData Augmentation | —Unverified | 0 |
| MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time | May 25, 2024 | GSM8KMath | —Unverified | 0 |
| Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs | May 24, 2024 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Large Language Models Can Self-Correct with Key Condition Verification | May 23, 2024 | Arithmetic ReasoningMath | —Unverified | 0 |
| Can LLMs Solve longer Math Word Problems Better? | May 23, 2024 | Data AugmentationMath | CodeCode Available | 0 |
| "Turing Tests" For An AI Scientist | May 22, 2024 | AI AgentData Compression | —Unverified | 0 |
| Investigating Symbolic Capabilities of Large Language Models | May 21, 2024 | MathNavigate | —Unverified | 0 |
| DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction | May 20, 2024 | DiagnosticMath | CodeCode Available | 0 |
| Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving | May 20, 2024 | GSM8KMath | —Unverified | 0 |
| Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings | May 15, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A safety realignment framework via subspace-oriented model fusion for large language models | May 15, 2024 | Instruction FollowingMath | CodeCode Available | 0 |
| Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications | May 14, 2024 | GSM8KMath | —Unverified | 0 |
| MathDivide: Improved mathematical reasoning by large language models | May 12, 2024 | GSM8KLogical Reasoning | —Unverified | 0 |
| Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions? | May 10, 2024 | Mathtext similarity | CodeCode Available | 0 |
| Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process | May 10, 2024 | Geometry Problem SolvingMachine Translation | CodeCode Available | 0 |
| Aligning Tutor Discourse Supporting Rigorous Thinking with Tutee Content Mastery for Predicting Math Achievement | May 10, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought | May 9, 2024 | HallucinationMath | —Unverified | 0 |
| MAmmoTH2: Scaling Instructions from the Web | May 6, 2024 | ChatbotGSM8K | —Unverified | 0 |
| Assessing and Verifying Task Utility in LLM-Powered Applications | May 3, 2024 | Math | —Unverified | 0 |
| Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models | May 1, 2024 | Math | —Unverified | 0 |
| A Careful Examination of Large Language Model Performance on Grade School Arithmetic | May 1, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Math Multiple Choice Question Generation via Human-Large Language Model Collaboration | May 1, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Iterative Reasoning Preference Optimization | Apr 30, 2024 | ARCGSM8K | —Unverified | 0 |
| Small Language Models Need Strong Verifiers to Self-Correct Reasoning | Apr 26, 2024 | Math | CodeCode Available | 0 |
| Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training | Apr 22, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone | Apr 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| PARAMANU-GANITA: Language Model with Mathematical Capabilities | Apr 22, 2024 | Domain AdaptationGSM8K | —Unverified | 0 |
| Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank | Apr 19, 2024 | Distractor GenerationMath | —Unverified | 0 |
| On the Empirical Complexity of Reasoning and Planning in LLMs | Apr 17, 2024 | Math | —Unverified | 0 |
| Mental Stress Detection: Development and Evaluation of a Wearable In-Ear Plethysmography | Apr 12, 2024 | MathMental Stress Detection | —Unverified | 0 |
| Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems | Apr 10, 2024 | Math | —Unverified | 0 |
| MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education | Apr 10, 2024 | Math | —Unverified | 0 |
| FRACTAL: Fine-Grained Scoring from Aggregate Text Labels | Apr 7, 2024 | MathMultiple Instance Learning | —Unverified | 0 |
| MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification | Apr 7, 2024 | Image ComprehensionMath | CodeCode Available | 0 |
| Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving | Apr 5, 2024 | Data AugmentationIn-Context Learning | —Unverified | 0 |
| HyperCLOVA X Technical Report | Apr 2, 2024 | Instruction FollowingMachine Translation | —Unverified | 0 |
| Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models | Apr 2, 2024 | Distractor GenerationIn-Context Learning | CodeCode Available | 0 |
| LM^2: A Simple Society of Language Models Solves Complex Reasoning | Apr 2, 2024 | MathMedQA | CodeCode Available | 0 |
| IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations | Apr 1, 2024 | BenchmarkingMath | —Unverified | 0 |
| Exploring the Mystery of Influential Data for Mathematical Reasoning | Apr 1, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Stable Code Technical Report | Apr 1, 2024 | Code CompletionLanguage Modelling | —Unverified | 0 |
| Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models | Apr 1, 2024 | In-Context LearningMath | CodeCode Available | 0 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 |