| Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection | Nov 13, 2024 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning | Nov 8, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| Benchmarking Large Language Models with Integer Sequence Generation Tasks | Nov 7, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Kwai-STaR: Transform LLMs into State-Transition Reasoners | Nov 7, 2024 | GSM8KMathematical Problem-Solving | —Unverified | 0 |
| FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI | Nov 7, 2024 | Mathematical Reasoning | —Unverified | 0 |
| MoD: A Distribution-Based Approach for Merging Large Language Models | Nov 1, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing | Nov 1, 2024 | 2kIn-Context Learning | —Unverified | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | Oct 29, 2024 | Mathematical Reasoning | —Unverified | 0 |
| DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models | Oct 29, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Library Learning Doesn't: The Curious Case of the Single-Use "Library" | Oct 26, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical Reasoning Tasks | Oct 26, 2024 | DiversityMathematical Reasoning | —Unverified | 0 |
| ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning | Oct 24, 2024 | GSM8KMath | —Unverified | 0 |
| SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning | Oct 24, 2024 | Knowledge DistillationMathematical Reasoning | CodeCode Available | 0 |
| Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks | Oct 24, 2024 | Logical ReasoningMathematical Problem-Solving | —Unverified | 0 |
| Markov Chain of Thought for Efficient Mathematical Reasoning | Oct 23, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Can Large Language Models Invent Algorithms to Improve Themselves? | Oct 21, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Keep Guessing? When Considering Inference Scaling, Mind the Baselines | Oct 20, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology | Oct 19, 2024 | Logical ReasoningMath | —Unverified | 0 |
| Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning | Oct 18, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs | Oct 17, 2024 | Mathematical Reasoning | —Unverified | 0 |
| AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning | Oct 17, 2024 | Mathematical ReasoningQuestion Answering | —Unverified | 0 |
| Enhancing Mathematical Reasoning in LLMs by Stepwise Correction | Oct 16, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning | Oct 16, 2024 | AllGSM8K | CodeCode Available | 0 |
| MIND: Math Informed syNthetic Dialogues for Pretraining LLMs | Oct 15, 2024 | GSM8KMath | —Unverified | 0 |
| Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement | Oct 14, 2024 | In-Context LearningMathematical Reasoning | —Unverified | 0 |
| Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning | Oct 14, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective | Oct 14, 2024 | Density Ratio EstimationGSM8K | CodeCode Available | 0 |
| Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning | Oct 13, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| A Systematic Survey on Large Language Models for Algorithm Design | Oct 11, 2024 | Mathematical Reasoningscientific discovery | —Unverified | 0 |
| Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models | Oct 10, 2024 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks | Oct 10, 2024 | 8kDiversity | —Unverified | 0 |
| TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees | Oct 10, 2024 | Mathematical Reasoning | —Unverified | 0 |
| VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers | Oct 10, 2024 | Mathematical ReasoningQ-Learning | —Unverified | 0 |
| Herald: A Natural Language Annotated Lean 4 Dataset | Oct 9, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning | Oct 9, 2024 | Mathematical Reasoning | —Unverified | 0 |
| PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness | Oct 9, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Subtle Errors Matter: Preference Learning via Error-injected Self-editing | Oct 9, 2024 | GSM8KMath | —Unverified | 0 |
| Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning | Oct 8, 2024 | Image RetrievalMath | —Unverified | 0 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 |
| Give me a hint: Can LLMs take a hint to solve math problems? | Oct 8, 2024 | Adversarial RobustnessMath | CodeCode Available | 0 |
| MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs | Oct 7, 2024 | Information RetrievalMathematical Reasoning | —Unverified | 0 |
| ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection | Oct 6, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark | Oct 6, 2024 | Mathematical ReasoningSpatial Reasoning | CodeCode Available | 0 |
| TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions | Oct 5, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Table Question Answering for Low-resourced Indic Languages | Oct 4, 2024 | Cross-Lingual TransferMathematical Reasoning | CodeCode Available | 0 |
| CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning | Oct 3, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning | Oct 3, 2024 | Code GenerationIn-Context Learning | —Unverified | 0 |
| Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance | Oct 3, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models | Oct 2, 2024 | Cross-Lingual TransferMath | —Unverified | 0 |