| From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education | Feb 19, 2025 | DiagnosticGSM8K | —Unverified | 0 |
| Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning | Feb 19, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence | Feb 19, 2025 | Code GenerationDecision Making | CodeCode Available | 1 |
| Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models | Feb 18, 2025 | Data AugmentationGSM8K | —Unverified | 0 |
| Theorem Prover as a Judge for Synthetic Data Generation | Feb 18, 2025 | Mathematical ProofsMathematical Reasoning | —Unverified | 0 |
| Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models | Feb 18, 2025 | Code GenerationGeneral Knowledge | —Unverified | 0 |
| Large Language Models and Mathematical Reasoning Failures | Feb 17, 2025 | Mathematical ReasoningPhysical Intuition | —Unverified | 0 |
| Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving | Feb 17, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task | Feb 17, 2025 | Code CompletionGSM8K | —Unverified | 0 |
| Uncertainty-Aware Step-wise Verification with Generative Reward Models | Feb 16, 2025 | Mathematical ReasoningUncertainty Quantification | —Unverified | 0 |
| Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning | Feb 16, 2025 | Mathematical Reasoning | —Unverified | 0 |
| 1bit-Merging: Dynamic Quantized Merging for Large Language Models | Feb 15, 2025 | Code GenerationMath | —Unverified | 0 |
| Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering | Feb 14, 2025 | Mathematical ReasoningObject | —Unverified | 0 |
| GoRA: Gradient-driven Adaptive Low Rank Adaptation | Feb 13, 2025 | Computational EfficiencyMathematical Reasoning | —Unverified | 0 |
| Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models | Feb 12, 2025 | Mathematical ReasoningMMLU | —Unverified | 0 |
| One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs | Feb 12, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | Feb 12, 2025 | GSM8KMath | CodeCode Available | 0 |
| LLMs can implicitly learn from mistakes in-context | Feb 12, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 0 |
| MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations | Feb 10, 2025 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning | Feb 10, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates | Feb 10, 2025 | Hierarchical Reinforcement LearningLanguage Modeling | CodeCode Available | 4 |
| Self-Training Large Language Models for Tool-Use Without Demonstrations | Feb 9, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 |
| ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | Feb 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |