| Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics | Apr 24, 2025 | Code GenerationMath | —Unverified | 0 |
| DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training | Apr 24, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Parameter-Efficient Checkpoint Merging via Metrics-Weighted Averaging | Apr 23, 2025 | Mathematical Reasoningparameter-efficient fine-tuning | —Unverified | 0 |
| Improving RL Exploration for LLM Reasoning through Retrospective Replay | Apr 19, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| BitNet b1.58 2B4T Technical Report | Apr 16, 2025 | Computational EfficiencyCPU | —Unverified | 0 |
| ReTool: Reinforcement Learning for Strategic Tool Use in LLMs | Apr 15, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Assessment of Evolving Large Language Models in Upper Secondary Mathematics | Apr 15, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection | Apr 13, 2025 | Answer SelectionAutomated Theorem Proving | —Unverified | 0 |
| Supervised Optimism Correction: Be Confident When LLMs Are Sure | Apr 10, 2025 | GSM8KMath | —Unverified | 0 |
| Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use | Apr 7, 2025 | GSM8KMath | —Unverified | 0 |
| Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation | Apr 4, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Do LLM Evaluators Prefer Themselves for a Reason? | Apr 4, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| Sample, Don't Search: Rethinking Test-Time Alignment for Language Models | Apr 4, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| LexPam: Legal Procedure Awareness-Guided Mathematical Reasoning | Apr 3, 2025 | Mathematical ReasoningQuestion Answering | —Unverified | 0 |
| LLM Library Learning Fails: A LEGO-Prover Case Study | Apr 3, 2025 | Mathematical ReasoningMisconceptions | —Unverified | 0 |
| LLM for Complex Reasoning Task: An Exploratory Study in Fermi Problems | Apr 3, 2025 | Mathematical Reasoning | —Unverified | 0 |
| How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study | Apr 1, 2025 | Code GenerationMath | —Unverified | 0 |
| Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics | Apr 1, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| VerifiAgent: a Unified Verification Agent in Language Model Reasoning | Apr 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning | Apr 1, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| The Axiom-Based Atlas: A Structural Mapping of Theorems via Foundational Proof Vectors | Mar 31, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains | Mar 31, 2025 | Mathematical Reasoningreinforcement-learning | —Unverified | 0 |
| SWI: Speaking with Intent in Large Language Models | Mar 27, 2025 | Mathematical ReasoningQuestion Answering | CodeCode Available | 0 |
| Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad | Mar 27, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models | Mar 27, 2025 | Data VisualizationMath | CodeCode Available | 0 |