| VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks | Jul 17, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation | Jul 17, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training | Jul 16, 2025 | Code GenerationMath | —Unverified | 0 |
| Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding | Jul 15, 2025 | Math | —Unverified | 0 |
| Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing | Jul 15, 2025 | Knowledge TracingMath | CodeCode Available | 0 |
| Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Jul 14, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning | Jul 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs | Jul 10, 2025 | CoLALarge Language Model | —Unverified | 0 |
| Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model | Jul 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs | Jul 8, 2025 | GSM8KMath | —Unverified | 0 |