| Generative Discovery of Partial Differential Equations by Learning from Math Handbooks | May 9, 2025 | Computational EfficiencyMath | —Unverified | 0 |
| Scalable LLM Math Reasoning Acceleration with Low-rank Distillation | May 8, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers | May 7, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning | May 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law | May 5, 2025 | MathMedical Diagnosis | —Unverified | 0 |
| Generating Narrated Lecture Videos from Slides with Synchronized Highlights | May 5, 2025 | Mathtext-to-speech | —Unverified | 0 |
| LookAlike: Consistent Distractor Generation in Math MCQs | May 3, 2025 | Distractor GenerationMath | —Unverified | 0 |
| TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students | May 2, 2025 | GSM8KIn-Context Learning | CodeCode Available | 0 |
| AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models | Apr 30, 2025 | In-Context LearningMath | —Unverified | 0 |
| Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math | Apr 30, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Phi-4-reasoning Technical Report | Apr 30, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| LLMs Do Not Have Human-Like Working Memory | Apr 30, 2025 | Math | —Unverified | 0 |
| Local Prompt Optimization | Apr 29, 2025 | GSM8KMath | —Unverified | 0 |
| Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition | Apr 29, 2025 | GSM8KKnowledge Distillation | —Unverified | 0 |
| Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Apr 28, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries | Apr 27, 2025 | Automated Theorem ProvingBug fixing | —Unverified | 0 |
| Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics | Apr 24, 2025 | Code GenerationMath | —Unverified | 0 |
| Training Large Language Models to Reason via EM Policy Gradient | Apr 24, 2025 | GSM8KMath | —Unverified | 0 |
| SplitReason: Learning To Offload Reasoning | Apr 23, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models | Apr 22, 2025 | Math | —Unverified | 0 |
| Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators | Apr 21, 2025 | Code GenerationInstruction Following | CodeCode Available | 0 |
| LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception | Apr 21, 2025 | MathMMLU | —Unverified | 0 |
| OTC: Optimal Tool Calls via Reinforcement Learning | Apr 21, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| Enhancing Math Learning in an LMS Using AI-Driven Question Recommendations | Apr 18, 2025 | ManagementMath | —Unverified | 0 |
| Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? | Apr 18, 2025 | MathVisual Reasoning | —Unverified | 0 |