| Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus | Nov 19, 2024 | Formal LogicLogical Reasoning | CodeCode Available | 2 |
| Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues | Nov 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs | Nov 14, 2024 | General KnowledgeMath | CodeCode Available | 0 |
| RESOLVE: Relational Reasoning with Symbolic and Object-Level Features Using Vector Symbolic Processing | Nov 13, 2024 | DecoderMath | CodeCode Available | 0 |
| What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? | Nov 12, 2024 | GSM8KMath | CodeCode Available | 1 |
| Problem-Oriented Segmentation and Retrieval: Case Study on Tutoring Conversations | Nov 12, 2024 | MathRetrieval | CodeCode Available | 1 |
| UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts | Nov 11, 2024 | Code GenerationGSM8K | CodeCode Available | 1 |
| OpenAI-o1 AB Testing: Does the o1 model really do good reasoning in math problem solving? | Nov 9, 2024 | Logical ReasoningMath | —Unverified | 0 |
| VISTA: Visual Integrated System for Tailored Automation in Math Problem Generation Using LLM | Nov 8, 2024 | Math | —Unverified | 0 |
| Aioli: A Unified Optimization Framework for Language Model Data Mixing | Nov 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams | Nov 7, 2024 | Math | —Unverified | 0 |
| Meta-Reasoning Improves Tool Use in Large Language Models | Nov 7, 2024 | Math | CodeCode Available | 0 |
| Self-Consistency Preference Optimization | Nov 6, 2024 | GSM8KMath | —Unverified | 0 |
| Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology | Nov 5, 2024 | MathMisconceptions | —Unverified | 0 |
| Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification | Nov 4, 2024 | MathReranking | CodeCode Available | 0 |
| Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models | Nov 4, 2024 | Inductive BiasLanguage Modeling | CodeCode Available | 1 |
| Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models | Nov 2, 2024 | GSM8KMath | —Unverified | 0 |
| STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing | Nov 1, 2024 | 2kIn-Context Learning | —Unverified | 0 |
| DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models | Oct 29, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Improving Math Problem Solving in Large Language Models Through Categorization and Strategy Tailoring | Oct 29, 2024 | Math | —Unverified | 0 |
| Automated Feedback in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses | Oct 29, 2024 | MathZero-Shot Learning | —Unverified | 0 |
| Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency | Oct 28, 2024 | Math | CodeCode Available | 1 |
| Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | Oct 28, 2024 | Arithmetic ReasoningMath | CodeCode Available | 1 |
| EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation | Oct 28, 2024 | ARCMath | —Unverified | 0 |
| Flaming-hot Initiation with Regular Execution Sampling for Large Language Models | Oct 28, 2024 | DiversityMath | CodeCode Available | 2 |
| Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks? | Oct 27, 2024 | Data AugmentationMath | CodeCode Available | 0 |
| Library Learning Doesn't: The Curious Case of the Single-Use "Library" | Oct 26, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Can Stories Help LLMs Reason? Curating Information Space Through Narrative | Oct 25, 2024 | Math | —Unverified | 0 |
| Mixture of Parrots: Experts improve memorization more than reasoning | Oct 24, 2024 | MathMemorization | —Unverified | 0 |
| ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning | Oct 24, 2024 | GSM8KMath | —Unverified | 0 |
| Scaling up Masked Diffusion Models on Text | Oct 24, 2024 | GSM8KLanguage Modeling | CodeCode Available | 3 |
| Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch | Oct 24, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems | Oct 24, 2024 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Oct 23, 2024 | MathMixture-of-Experts | —Unverified | 0 |
| Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation | Oct 22, 2024 | GSM8KMath | —Unverified | 0 |
| Non-myopic Generation of Language Models for Reasoning and Planning | Oct 22, 2024 | Computational EfficiencyLanguage Modelling | CodeCode Available | 1 |
| Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes | Oct 22, 2024 | GSM8KLanguage Modeling | CodeCode Available | 1 |
| Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration | Oct 22, 2024 | Math | —Unverified | 0 |
| Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality | Oct 22, 2024 | Math | —Unverified | 0 |
| JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation | Oct 22, 2024 | Math | —Unverified | 0 |
| PromptHive: Bringing Subject Matter Experts Back to the Forefront with Collaborative Prompt Engineering for Educational Content Creation | Oct 21, 2024 | MathPrompt Engineering | —Unverified | 0 |
| No more hard prompts: SoftSRV prompting for synthetic data generation | Oct 21, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems | Oct 21, 2024 | Automated Theorem ProvingCPU | CodeCode Available | 4 |
| Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology | Oct 19, 2024 | Logical ReasoningMath | —Unverified | 0 |
| On Designing Effective RL Reward at Training Time for LLM Reasoning | Oct 19, 2024 | GSM8KMath | —Unverified | 0 |
| Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning | Oct 18, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens | Oct 18, 2024 | MathQuestion Answering | —Unverified | 0 |
| LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems | Oct 18, 2024 | In-Context LearningMath | —Unverified | 0 |
| SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation | Oct 17, 2024 | GSM8KLanguage Modeling | CodeCode Available | 0 |
| A Comparative Study on Reasoning Patterns of OpenAI's o1 Model | Oct 17, 2024 | Math | CodeCode Available | 2 |