| Synthetic Data RL: Task Definition Is All You Need | May 18, 2025 | AllGSM8K | CodeCode Available | 2 |
| Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning | May 18, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 |
| SLOT: Sample-specific Language Model Optimization at Test-time | May 18, 2025 | GSM8KLanguage Modeling | CodeCode Available | 2 |
| Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models | May 15, 2025 | Code GenerationGSM8K | —Unverified | 0 |
| Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping | May 13, 2025 | Domain GeneralizationGSM8K | —Unverified | 0 |
| AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection | May 12, 2025 | GSM8KHumanEval | —Unverified | 0 |
| S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models | May 12, 2025 | GSM8KLarge Language Model | —Unverified | 0 |
| Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2 | May 9, 2025 | ARCBelebele | —Unverified | 0 |
| Rewriting Pre-Training Data Boosts LLM Performance in Math and Code | May 5, 2025 | Code GenerationGSM8K | CodeCode Available | 1 |
| Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients | May 3, 2025 | GSM8KMMLU | —Unverified | 0 |
| Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth | May 2, 2025 | GSM8KQuantization | —Unverified | 0 |
| TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students | May 2, 2025 | GSM8KIn-Context Learning | CodeCode Available | 0 |
| NeMo-Inspector: A Visualization Tool for LLM Generation Analysis | May 1, 2025 | GSM8KMath | CodeCode Available | 1 |
| Local Prompt Optimization | Apr 29, 2025 | GSM8KMath | —Unverified | 0 |
| Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition | Apr 29, 2025 | GSM8KKnowledge Distillation | —Unverified | 0 |
| AutoJudge: Judge Decoding Without Manual Annotation | Apr 28, 2025 | GSM8KLarge Language Model | —Unverified | 0 |
| Efficient Reasoning for LLMs through Speculative Chain-of-Thought | Apr 27, 2025 | GSM8KMath | CodeCode Available | 1 |
| Training Large Language Models to Reason via EM Policy Gradient | Apr 24, 2025 | GSM8KMath | —Unverified | 0 |
| Dynamic Early Exit in Reasoning Models | Apr 22, 2025 | GSM8KMath | CodeCode Available | 2 |
| Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning | Apr 18, 2025 | AllGSM8K | —Unverified | 0 |
| Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation | Apr 16, 2025 | GSM8KMath | —Unverified | 0 |
| Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution | Apr 13, 2025 | GSM8KMath | CodeCode Available | 3 |
| Question Tokens Deserve More Attention: Enhancing Large Language Models without Training through Step-by-Step Reading and Question Attention Recalibration | Apr 13, 2025 | GSM8K | —Unverified | 0 |
| Supervised Optimism Correction: Be Confident When LLMs Are Sure | Apr 10, 2025 | GSM8KMath | —Unverified | 0 |
| Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use | Apr 7, 2025 | GSM8KMath | —Unverified | 0 |
| SEAL: Steerable Reasoning Calibration of Large Language Models for Free | Apr 7, 2025 | GSM8K | CodeCode Available | 2 |
| Sample, Don't Search: Rethinking Test-Time Alignment for Language Models | Apr 4, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency | Apr 4, 2025 | BenchmarkingGSM8K | —Unverified | 0 |
| Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models | Apr 3, 2025 | GSM8KReinforcement Learning (RL) | CodeCode Available | 0 |
| Large (Vision) Language Models are Unsupervised In-Context Learners | Apr 3, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 |
| Exploring LLM Reasoning Through Controlled Prompt Variations | Apr 2, 2025 | GSM8KMathematical Problem-Solving | CodeCode Available | 0 |
| Adaptive Rectification Sampling for Test-Time Compute Scaling | Apr 2, 2025 | GSM8KLogical Reasoning | CodeCode Available | 0 |
| Entropy-Based Adaptive Weighting for Self-Training | Mar 31, 2025 | GSM8KMath | CodeCode Available | 1 |
| CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models | Mar 28, 2025 | GPUGSM8K | CodeCode Available | 2 |
| Qwen2.5-Omni Technical Report | Mar 26, 2025 | Automatic Speech Recognition (ASR)GSM8K | CodeCode Available | 7 |
| Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? | Mar 23, 2025 | GSM8KMath | CodeCode Available | 0 |
| D^2LoRA: Data-Driven LoRA Initialization for Low Resource Tasks | Mar 23, 2025 | GSM8K | —Unverified | 0 |
| Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models | Mar 21, 2025 | GSM8KQuestion Answering | CodeCode Available | 2 |
| SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging | Mar 21, 2025 | GSM8KSafety Alignment | CodeCode Available | 1 |
| Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs | Mar 18, 2025 | GSM8KMath | —Unverified | 0 |
| Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach | Mar 17, 2025 | GSM8KMath | —Unverified | 0 |
| Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models | Mar 16, 2025 | Data AugmentationGSM8K | CodeCode Available | 1 |
| Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models | Mar 14, 2025 | Checkmate In OneGSM8K | —Unverified | 0 |
| Position-Aware Depth Decay Decoding (D^3): Boosting Large Language Model Inference Efficiency | Mar 11, 2025 | GSM8KLanguage Modeling | —Unverified | 0 |
| SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning | Mar 6, 2025 | GSM8KMath | —Unverified | 0 |
| Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models | Mar 4, 2025 | GSM8KMath | —Unverified | 0 |
| PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models | Mar 4, 2025 | GSM8KMath | CodeCode Available | 1 |
| DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability | Mar 4, 2025 | GSM8KLogical Reasoning | CodeCode Available | 0 |
| CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation | Feb 28, 2025 | GSM8K | CodeCode Available | 0 |
| Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge | Feb 27, 2025 | GSM8KHumanEval | —Unverified | 0 |