| Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance | Oct 3, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning | Oct 3, 2024 | Efficient ExplorationMathematical Problem-Solving | CodeCode Available | 5 |
| GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning | Oct 3, 2024 | Code GenerationIn-Context Learning | —Unverified | 0 |
| CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning | Oct 3, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Evaluating Robustness of Reward Models for Mathematical Reasoning | Oct 2, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data | Oct 2, 2024 | Arithmetic ReasoningLarge Language Model | CodeCode Available | 4 |
| Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models | Oct 2, 2024 | Cross-Lingual TransferMath | —Unverified | 0 |
| Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems | Sep 30, 2024 | GSM8KMath | CodeCode Available | 0 |
| INC-Math: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models | Sep 28, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Revisiting the Superficial Alignment Hypothesis | Sep 27, 2024 | Instruction FollowingMath | —Unverified | 0 |
| HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models | Sep 27, 2024 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Evaluation of OpenAI o1: Opportunities and Challenges of AGI | Sep 27, 2024 | Emotion RecognitionLarge Language Model | —Unverified | 0 |
| PACE: Marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization | Sep 25, 2024 | 8kDomain Adaptation | CodeCode Available | 1 |
| LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ | Sep 25, 2024 | ChatbotGSM8K | —Unverified | 0 |
| ControlMath: Controllable Data Generation Promotes Math Generalist Models | Sep 20, 2024 | Data AugmentationDiversity | —Unverified | 0 |
| Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning | Sep 19, 2024 | FormInstruction Following | CodeCode Available | 1 |
| InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning | Sep 19, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | Sep 18, 2024 | GSM8KMath | —Unverified | 0 |
| RoMath: A Mathematical Reasoning Benchmark in Romanian | Sep 17, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| Causal Inference with Large Language Model: A Survey | Sep 15, 2024 | Causal InferenceLanguage Modeling | —Unverified | 0 |
| CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks | Sep 13, 2024 | ARCCode Generation | —Unverified | 0 |
| Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding | Sep 13, 2024 | Contrastive LearningLanguage Modeling | —Unverified | 0 |
| MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model | Sep 10, 2024 | DiversityLanguage Modeling | —Unverified | 0 |
| Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4 | Sep 9, 2024 | Abstract AlgebraAutomated Theorem Proving | CodeCode Available | 0 |
| Diagram Formalization Enhanced Multi-Modal Geometry Problem Solver | Sep 6, 2024 | Geometry Problem SolvingMathematical Reasoning | CodeCode Available | 1 |
| From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks | Sep 6, 2024 | Machine TranslationMathematical Reasoning | —Unverified | 0 |
| CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models | Sep 4, 2024 | GSM8KMath | CodeCode Available | 2 |
| Building Math Agents with Multi-Turn Iterative Preference Learning | Sep 4, 2024 | GSM8KMath | —Unverified | 0 |
| S^3c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners | Sep 3, 2024 | GSM8KMath | —Unverified | 0 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 |
| Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems | Aug 29, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding | Aug 28, 2024 | Mathematical Reasoning | —Unverified | 0 |
| SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models | Aug 28, 2024 | Data AugmentationGSM8K | —Unverified | 0 |
| Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation | Aug 28, 2024 | Knowledge DistillationLanguage Modelling | —Unverified | 0 |
| Path-Consistency: Prefix Enhancement for Efficient Inference in LLM | Aug 25, 2024 | Code GenerationCommon Sense Reasoning | —Unverified | 0 |
| Tangram: Benchmark for Evaluating Geometric Element Recognition in Large Multimodal Models | Aug 25, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Multi-tool Integration Application for Math Reasoning Using Large Language Model | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding | Aug 21, 2024 | Logical ReasoningMathematical Reasoning | —Unverified | 0 |
| Taming Generative Diffusion Prior for Universal Blind Image Restoration | Aug 21, 2024 | Image RestorationMathematical Reasoning | —Unverified | 0 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting | Aug 18, 2024 | HumanEvalMathematical Reasoning | —Unverified | 0 |
| Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning | Aug 16, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Aug 14, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty | Aug 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 0 |
| Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement | Aug 6, 2024 | Code GenerationDisentanglement | CodeCode Available | 1 |
| MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems | Aug 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AI-Assisted Generation of Difficult Math Questions | Jul 30, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process | Jul 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages | Jul 29, 2024 | DiversityInstruction Following | CodeCode Available | 2 |
| Optimizing Numerical Estimation and Operational Efficiency in the Legal Domain through Large Language Models | Jul 26, 2024 | Mathematical Reasoning | —Unverified | 0 |