| S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models | May 12, 2025 | GSM8KLarge Language Model | —Unverified | 0 |
| Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2 | May 9, 2025 | ARCBelebele | —Unverified | 0 |
| Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients | May 3, 2025 | GSM8KMMLU | —Unverified | 0 |
| TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students | May 2, 2025 | GSM8KIn-Context Learning | CodeCode Available | 0 |
| Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth | May 2, 2025 | GSM8KQuantization | —Unverified | 0 |
| Local Prompt Optimization | Apr 29, 2025 | GSM8KMath | —Unverified | 0 |
| Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition | Apr 29, 2025 | GSM8KKnowledge Distillation | —Unverified | 0 |
| AutoJudge: Judge Decoding Without Manual Annotation | Apr 28, 2025 | GSM8KLarge Language Model | —Unverified | 0 |
| Training Large Language Models to Reason via EM Policy Gradient | Apr 24, 2025 | GSM8KMath | —Unverified | 0 |
| Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning | Apr 18, 2025 | AllGSM8K | —Unverified | 0 |