| Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs | Jan 11, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning | Jan 6, 2025 | In-Context LearningMath | CodeCode Available | 1 |
| CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis | Jan 3, 2025 | Math | CodeCode Available | 1 |
| Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Dec 27, 2024 | Math | CodeCode Available | 1 |
| CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models | Dec 23, 2024 | Decision MakingMath | CodeCode Available | 1 |
| Entropy-Regularized Process Reward Model | Dec 15, 2024 | GSM8KMath | CodeCode Available | 1 |
| HARP: A challenging human-annotated math reasoning benchmark | Dec 11, 2024 | Math | CodeCode Available | 1 |
| U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs | Dec 4, 2024 | DiversityMath | CodeCode Available | 1 |
| Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | Nov 29, 2024 | GSM8KMath | CodeCode Available | 1 |
| Training and Evaluating Language Models with Template-based Data Generation | Nov 27, 2024 | Data AugmentationMath | CodeCode Available | 1 |