| ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning | May 21, 2025 | Conversational Searchreinforcement-learning | CodeCode Available | 2 |
| Learn to Reason Efficiently with Adaptive Length-based Reward Shaping | May 21, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization | May 21, 2025 | Vision-Language-ActionZero-shot Generalization | CodeCode Available | 2 |
| Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes | May 21, 2025 | Music ClassificationMusic Generation | CodeCode Available | 2 |
| PhyX: Does Your Model Have the "Wits" for Physical Reasoning? | May 21, 2025 | | CodeCode Available | 2 |
| iPad: Iterative Proposal-centric End-to-End Autonomous Driving | May 21, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 2 |
| dKV-Cache: The Cache for Diffusion Language Models | May 21, 2025 | Code GenerationDenoising | CodeCode Available | 2 |
| MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models | May 21, 2025 | Computational Efficiency | CodeCode Available | 2 |
| Scaling Diffusion Transformers Efficiently via μP | May 21, 2025 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| Learning Spatio-Temporal Dynamics for Trajectory Recovery via Time-Aware Transformer | May 20, 2025 | Trajectory Recovery | CodeCode Available | 2 |
| Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers | May 20, 2025 | GPUVideo Generation | CodeCode Available | 2 |
| UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models | May 20, 2025 | GPULifelong learning | CodeCode Available | 2 |
| Quartet: Native FP4 Training Can Be Optimal for Large Language Models | May 20, 2025 | | CodeCode Available | 2 |
| CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation | May 20, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 2 |
| UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens | May 20, 2025 | | CodeCode Available | 2 |
| Let LLMs Break Free from Overthinking via Self-Braking Tuning | May 20, 2025 | GSM8K | CodeCode Available | 2 |
| Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models | May 20, 2025 | Video CompressionVideo Understanding | CodeCode Available | 2 |
| TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis | May 20, 2025 | Contrastive LearningSinging Voice Synthesis | CodeCode Available | 2 |
| PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks | May 20, 2025 | LLM JailbreakSafety Alignment | CodeCode Available | 2 |
| VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank | May 20, 2025 | Image GenerationImage Quality Assessment | CodeCode Available | 2 |
| Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning | May 20, 2025 | Domain GeneralizationMultimodal Reasoning | CodeCode Available | 2 |
| KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation | May 20, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 2 |
| Place Recognition: A Comprehensive Review, Current Challenges and Future Directions | May 20, 2025 | 3D Place RecognitionCross-modal place recognition | CodeCode Available | 2 |
| Rethinking Features-Fused-Pyramid-Neck for Object Detection | May 19, 2025 | object-detectionObject Detection | CodeCode Available | 2 |
| Temporal Query Network for Efficient Multivariate Time Series Forecasting | May 19, 2025 | Correlated Time Series ForecastingMultivariate Time Series Forecasting | CodeCode Available | 2 |