| The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer | Feb 21, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Forgotten Polygons: Multimodal Large Language Models are Shape-Blind | Feb 21, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| CER: Confidence Enhanced Reasoning in LLMs | Feb 20, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning | Feb 20, 2025 | Mathematical ReasoningRetrieval | —Unverified | 0 |
| Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning | Feb 20, 2025 | Mathematical Reasoning | —Unverified | 0 |
| From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education | Feb 19, 2025 | DiagnosticGSM8K | —Unverified | 0 |
| AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence | Feb 19, 2025 | Code GenerationDecision Making | CodeCode Available | 1 |
| Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning | Feb 19, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models | Feb 18, 2025 | Data AugmentationGSM8K | —Unverified | 0 |
| Theorem Prover as a Judge for Synthetic Data Generation | Feb 18, 2025 | Mathematical ProofsMathematical Reasoning | —Unverified | 0 |