| Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping | May 13, 2025 | Domain GeneralizationGSM8K | —Unverified | 0 |
| Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach | May 12, 2025 | MathMulti-Task Learning | —Unverified | 0 |
| Learning from Peers in Reasoning Models | May 12, 2025 | Math | —Unverified | 0 |
| Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving | May 12, 2025 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models | May 12, 2025 | GSM8KLarge Language Model | —Unverified | 0 |
| Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning | May 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs | May 11, 2025 | DiversityMath | —Unverified | 0 |
| xGen-small Technical Report | May 10, 2025 | DecoderMath | —Unverified | 0 |
| Generative Discovery of Partial Differential Equations by Learning from Math Handbooks | May 9, 2025 | Computational EfficiencyMath | —Unverified | 0 |
| Scalable LLM Math Reasoning Acceleration with Low-rank Distillation | May 8, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers | May 7, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning | May 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| RM-R1: Reward Modeling as Reasoning | May 5, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Generating Narrated Lecture Videos from Slides with Synchronized Highlights | May 5, 2025 | Mathtext-to-speech | —Unverified | 0 |
| Rewriting Pre-Training Data Boosts LLM Performance in Math and Code | May 5, 2025 | Code GenerationGSM8K | CodeCode Available | 1 |
| A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law | May 5, 2025 | MathMedical Diagnosis | —Unverified | 0 |
| LookAlike: Consistent Distractor Generation in Math MCQs | May 3, 2025 | Distractor GenerationMath | —Unverified | 0 |
| TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students | May 2, 2025 | GSM8KIn-Context Learning | CodeCode Available | 0 |
| DeepCritic: Deliberate Critique with Large Language Models | May 1, 2025 | Math | CodeCode Available | 1 |
| NeMo-Inspector: A Visualization Tool for LLM Generation Analysis | May 1, 2025 | GSM8KMath | CodeCode Available | 1 |
| LLMs Do Not Have Human-Like Working Memory | Apr 30, 2025 | Math | —Unverified | 0 |
| Phi-4-reasoning Technical Report | Apr 30, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models | Apr 30, 2025 | In-Context LearningMath | —Unverified | 0 |
| Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math | Apr 30, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Local Prompt Optimization | Apr 29, 2025 | GSM8KMath | —Unverified | 0 |