| MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | Mar 21, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection | Mar 21, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Instructing Large Language Models to Identify and Ignore Irrelevant Conditions | Mar 19, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety | Mar 18, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Apriori Knowledge in an Era of Computational Opacity: The Role of AI in Mathematical Discovery | Mar 15, 2024 | Mathematical Reasoning | —Unverified | 0 |
| FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models | Mar 12, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control | Mar 11, 2024 | Code GenerationDiversity | —Unverified | 0 |
| RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Mar 8, 2024 | Code GenerationHallucination | CodeCode Available | 3 |
| Machine learning and information theory concepts towards an AI Mathematician | Mar 7, 2024 | Mathematical Reasoning | —Unverified | 0 |
| MathScale: Scaling Instruction Tuning for Mathematical Reasoning | Mar 5, 2024 | GSM8KMath | CodeCode Available | 0 |
| Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models | Mar 4, 2024 | Data AugmentationGSM8K | CodeCode Available | 1 |
| Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning | Mar 4, 2024 | GSM8KMath | —Unverified | 0 |
| You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism | Mar 3, 2024 | Machine TranslationMathematical Reasoning | —Unverified | 0 |
| Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models | Mar 1, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers | Feb 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models | Feb 27, 2024 | Dark Humor DetectionDialogue Generation | —Unverified | 0 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 |
| MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs | Feb 26, 2024 | GSM8KMath | —Unverified | 0 |
| Stepwise Self-Consistent Mathematical Reasoning with Large Language Models | Feb 24, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| How Do Humans Write Code? Large Models Do It the Same Way Too | Feb 24, 2024 | Code GenerationMath | CodeCode Available | 0 |
| Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models | Feb 24, 2024 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought Processes | Feb 23, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset | Feb 22, 2024 | DiversityMath | CodeCode Available | 2 |
| ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models | Feb 22, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models | Feb 20, 2024 | Instruction FollowingLogical Reasoning | —Unverified | 0 |