| MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | Mar 21, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection | Mar 21, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Instructing Large Language Models to Identify and Ignore Irrelevant Conditions | Mar 19, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety | Mar 18, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Apriori Knowledge in an Era of Computational Opacity: The Role of AI in Mathematical Discovery | Mar 15, 2024 | Mathematical Reasoning | —Unverified | 0 |
| FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models | Mar 12, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control | Mar 11, 2024 | Code GenerationDiversity | —Unverified | 0 |
| RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Mar 8, 2024 | Code GenerationHallucination | CodeCode Available | 3 |
| Machine learning and information theory concepts towards an AI Mathematician | Mar 7, 2024 | Mathematical Reasoning | —Unverified | 0 |
| MathScale: Scaling Instruction Tuning for Mathematical Reasoning | Mar 5, 2024 | GSM8KMath | CodeCode Available | 0 |
| Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models | Mar 4, 2024 | Data AugmentationGSM8K | CodeCode Available | 1 |
| Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning | Mar 4, 2024 | GSM8KMath | —Unverified | 0 |
| You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism | Mar 3, 2024 | Machine TranslationMathematical Reasoning | —Unverified | 0 |
| Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models | Mar 1, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers | Feb 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models | Feb 27, 2024 | Dark Humor DetectionDialogue Generation | —Unverified | 0 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 |
| MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs | Feb 26, 2024 | GSM8KMath | —Unverified | 0 |
| Stepwise Self-Consistent Mathematical Reasoning with Large Language Models | Feb 24, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| How Do Humans Write Code? Large Models Do It the Same Way Too | Feb 24, 2024 | Code GenerationMath | CodeCode Available | 0 |
| Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models | Feb 24, 2024 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought Processes | Feb 23, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset | Feb 22, 2024 | DiversityMath | CodeCode Available | 2 |
| ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models | Feb 22, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models | Feb 20, 2024 | Instruction FollowingLogical Reasoning | —Unverified | 0 |
| Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models | Feb 20, 2024 | Mathematical Reasoning | CodeCode Available | 1 |
| Reformatted Alignment | Feb 19, 2024 | GSM8KHallucination | CodeCode Available | 2 |
| Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents | Feb 18, 2024 | Mathematical ReasoningMulti-hop Question Answering | CodeCode Available | 1 |
| Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement | Feb 18, 2024 | Mathematical ReasoningText Generation | CodeCode Available | 0 |
| Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering | Feb 17, 2024 | Arithmetic ReasoningMathematical Reasoning | —Unverified | 0 |
| When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | Feb 16, 2024 | Mathematical ReasoningRe-Ranking | CodeCode Available | 2 |
| Reasoning over Uncertain Text by Generative Large Language Models | Feb 14, 2024 | Decision MakingMathematical Reasoning | CodeCode Available | 0 |
| MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data | Feb 14, 2024 | Automated Theorem ProvingLanguage Modelling | CodeCode Available | 1 |
| Fourier Circuits in Neural Networks and Transformers: A Case Study of Modular Arithmetic with Multiple Inputs | Feb 12, 2024 | 2kMathematical Reasoning | —Unverified | 0 |
| Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts | Feb 12, 2024 | Continual PretrainingGSM8K | CodeCode Available | 2 |
| Can Graph Descriptive Order Affect Solving Graph Problems with LLMs? | Feb 11, 2024 | DescriptiveLanguage Modelling | —Unverified | 0 |
| Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models | Feb 6, 2024 | Mathematical ReasoningVariable Selection | —Unverified | 0 |
| DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | Feb 5, 2024 | Arithmetic ReasoningMath | CodeCode Available | 9 |
| Large Language Models for Mathematical Reasoning: Progresses and Challenges | Jan 31, 2024 | DiversityMath | —Unverified | 0 |
| Large Multi-Modal Models (LMMs) as Universal Foundation Models for AI-Native Wireless Systems | Jan 30, 2024 | Mathematical ReasoningRAG | —Unverified | 0 |
| Efficient Tool Use with Chain-of-Abstraction Reasoning | Jan 30, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| GAPS: Geometry-Aware Problem Solver | Jan 29, 2024 | Geometry Problem SolvingMath | —Unverified | 0 |
| EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty | Jan 26, 2024 | Code GenerationInstruction Following | CodeCode Available | 7 |
| Demystifying Chains, Trees, and Graphs of Thoughts | Jan 25, 2024 | Mathematical ReasoningPrompt Engineering | —Unverified | 0 |
| Distilling Mathematical Reasoning Capabilities into Small Language Models | Jan 22, 2024 | Mathematical Reasoning | —Unverified | 0 |
| SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese | Jan 22, 2024 | DiversityGSM8K | CodeCode Available | 2 |
| LangBridge: Multilingual Reasoning Without Multilingual Supervision | Jan 19, 2024 | Code CompletionLogical Reasoning | CodeCode Available | 2 |
| Knowledge Fusion of Large Language Models | Jan 19, 2024 | Code GenerationCommon Sense Reasoning | CodeCode Available | 4 |
| Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions | Jan 17, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Augmenting Math Word Problems via Iterative Question Composing | Jan 17, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |