| Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping | May 13, 2025 | Domain GeneralizationGSM8K | —Unverified | 0 |
| Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach | May 12, 2025 | MathMulti-Task Learning | —Unverified | 0 |
| Learning from Peers in Reasoning Models | May 12, 2025 | Math | —Unverified | 0 |
| Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving | May 12, 2025 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning | May 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models | May 12, 2025 | GSM8KLarge Language Model | —Unverified | 0 |
| DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs | May 11, 2025 | DiversityMath | —Unverified | 0 |
| xGen-small Technical Report | May 10, 2025 | DecoderMath | —Unverified | 0 |
| Generative Discovery of Partial Differential Equations by Learning from Math Handbooks | May 9, 2025 | Computational EfficiencyMath | —Unverified | 0 |
| Scalable LLM Math Reasoning Acceleration with Low-rank Distillation | May 8, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers | May 7, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning | May 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| RM-R1: Reward Modeling as Reasoning | May 5, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Generating Narrated Lecture Videos from Slides with Synchronized Highlights | May 5, 2025 | Mathtext-to-speech | —Unverified | 0 |
| Rewriting Pre-Training Data Boosts LLM Performance in Math and Code | May 5, 2025 | Code GenerationGSM8K | CodeCode Available | 1 |
| A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law | May 5, 2025 | MathMedical Diagnosis | —Unverified | 0 |
| LookAlike: Consistent Distractor Generation in Math MCQs | May 3, 2025 | Distractor GenerationMath | —Unverified | 0 |
| TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students | May 2, 2025 | GSM8KIn-Context Learning | CodeCode Available | 0 |
| NeMo-Inspector: A Visualization Tool for LLM Generation Analysis | May 1, 2025 | GSM8KMath | CodeCode Available | 1 |
| DeepCritic: Deliberate Critique with Large Language Models | May 1, 2025 | Math | CodeCode Available | 1 |
| LLMs Do Not Have Human-Like Working Memory | Apr 30, 2025 | Math | —Unverified | 0 |
| Phi-4-reasoning Technical Report | Apr 30, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math | Apr 30, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models | Apr 30, 2025 | In-Context LearningMath | —Unverified | 0 |
| Reinforcement Learning for Reasoning in Large Language Models with One Training Example | Apr 29, 2025 | Domain GeneralizationMath | CodeCode Available | 3 |
| Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition | Apr 29, 2025 | GSM8KKnowledge Distillation | —Unverified | 0 |
| Local Prompt Optimization | Apr 29, 2025 | GSM8KMath | —Unverified | 0 |
| Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Apr 28, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries | Apr 27, 2025 | Automated Theorem ProvingBug fixing | —Unverified | 0 |
| Efficient Reasoning for LLMs through Speculative Chain-of-Thought | Apr 27, 2025 | GSM8KMath | CodeCode Available | 1 |
| Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics | Apr 24, 2025 | Code GenerationMath | —Unverified | 0 |
| An Empirical Study on Prompt Compression for Large Language Models | Apr 24, 2025 | ArticlesMath | CodeCode Available | 3 |
| Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency | Apr 24, 2025 | BenchmarkingMath | CodeCode Available | 1 |
| Training Large Language Models to Reason via EM Policy Gradient | Apr 24, 2025 | GSM8KMath | —Unverified | 0 |
| SplitReason: Learning To Offload Reasoning | Apr 23, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Process Reward Models That Think | Apr 23, 2025 | Math | CodeCode Available | 2 |
| AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset | Apr 23, 2025 | MathMathematical Reasoning | CodeCode Available | 4 |
| DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models | Apr 22, 2025 | Math | CodeCode Available | 0 |
| Dynamic Early Exit in Reasoning Models | Apr 22, 2025 | GSM8KMath | CodeCode Available | 2 |
| TTRL: Test-Time Reinforcement Learning | Apr 22, 2025 | Mathreinforcement-learning | CodeCode Available | 7 |
| LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception | Apr 21, 2025 | MathMMLU | —Unverified | 0 |
| Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators | Apr 21, 2025 | Code GenerationInstruction Following | CodeCode Available | 0 |
| OTC: Optimal Tool Calls via Reinforcement Learning | Apr 21, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| Learning to Reason under Off-Policy Guidance | Apr 21, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 3 |
| Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction | Apr 21, 2025 | Math | CodeCode Available | 2 |
| Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning | Apr 21, 2025 | AllForm | CodeCode Available | 2 |
| Enhancing Math Learning in an LMS Using AI-Driven Question Recommendations | Apr 18, 2025 | ManagementMath | —Unverified | 0 |
| Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? | Apr 18, 2025 | MathVisual Reasoning | —Unverified | 0 |
| THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models | Apr 17, 2025 | BenchmarkingMath | —Unverified | 0 |
| MathPhys-Guided Coarse-to-Fine Anomaly Synthesis with SQE-Driven Bi-Level Optimization for Anomaly Detection | Apr 17, 2025 | Anomaly DetectionData Augmentation | —Unverified | 0 |