| VeriThinker: Learning to Verify Makes Reasoning Model Efficient | May 23, 2025 | model | CodeCode Available | 2 |
| ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay | May 22, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 2 |
| SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development | May 22, 2025 | Bug fixingChatbot | CodeCode Available | 2 |
| SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding | May 22, 2025 | Motion EstimationQuestion Answering | CodeCode Available | 2 |
| Structure-Aligned Protein Language Model | May 22, 2025 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 |
| DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution | May 22, 2025 | Super-ResolutionVideo Super-Resolution | CodeCode Available | 2 |
| Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding | May 22, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation | May 22, 2025 | Autonomous DrivingOut-of-Distribution Detection | CodeCode Available | 2 |
| Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models | May 22, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design | May 22, 2025 | CPUGPU | CodeCode Available | 2 |
| SEED: Speaker Embedding Enhancement Diffusion Model | May 22, 2025 | modelSpeaker Recognition | CodeCode Available | 2 |
| Training Long-Context LLMs Efficiently via Chunk-wise Optimization | May 22, 2025 | 16kGPU | CodeCode Available | 2 |
| SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward | May 22, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| Ranked Entropy Minimization for Continual Test-Time Adaptation | May 22, 2025 | Test-time Adaptation | CodeCode Available | 2 |
| WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning | May 22, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Seeing through Satellite Images at Street Views | May 22, 2025 | | CodeCode Available | 2 |
| GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning | May 22, 2025 | AttributeImage Generation | CodeCode Available | 2 |
| GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent | May 22, 2025 | | CodeCode Available | 2 |
| P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark | May 21, 2025 | | CodeCode Available | 2 |
| iPad: Iterative Proposal-centric End-to-End Autonomous Driving | May 21, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 2 |
| Meta-Design Matters: A Self-Design Multi-Agent System | May 21, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| dKV-Cache: The Cache for Diffusion Language Models | May 21, 2025 | Code GenerationDenoising | CodeCode Available | 2 |
| Scaling Diffusion Transformers Efficiently via μP | May 21, 2025 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| PhyX: Does Your Model Have the "Wits" for Physical Reasoning? | May 21, 2025 | | CodeCode Available | 2 |
| MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models | May 21, 2025 | Computational Efficiency | CodeCode Available | 2 |
| Web-Shepherd: Advancing PRMs for Reinforcing Web Agents | May 21, 2025 | Large Language ModelMultimodal Large Language Model | CodeCode Available | 2 |
| Graph Foundation Models: A Comprehensive Survey | May 21, 2025 | Graph LearningKnowledge Graphs | CodeCode Available | 2 |
| Learn to Reason Efficiently with Adaptive Length-based Reward Shaping | May 21, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition | May 21, 2025 | Earth ObservationObject | CodeCode Available | 2 |
| The P^3 dataset: Pixels, Points and Polygons for Multimodal Building Vectorization | May 21, 2025 | | CodeCode Available | 2 |
| ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning | May 21, 2025 | Conversational Searchreinforcement-learning | CodeCode Available | 2 |
| RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning | May 21, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization | May 21, 2025 | Vision-Language-ActionZero-shot Generalization | CodeCode Available | 2 |
| Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes | May 21, 2025 | Music ClassificationMusic Generation | CodeCode Available | 2 |
| UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models | May 20, 2025 | GPULifelong learning | CodeCode Available | 2 |
| Place Recognition: A Comprehensive Review, Current Challenges and Future Directions | May 20, 2025 | 3D Place RecognitionCross-modal place recognition | CodeCode Available | 2 |
| TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis | May 20, 2025 | Contrastive LearningSinging Voice Synthesis | CodeCode Available | 2 |
| Let LLMs Break Free from Overthinking via Self-Braking Tuning | May 20, 2025 | GSM8K | CodeCode Available | 2 |
| VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank | May 20, 2025 | Image GenerationImage Quality Assessment | CodeCode Available | 2 |
| Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers | May 20, 2025 | GPUVideo Generation | CodeCode Available | 2 |
| Quartet: Native FP4 Training Can Be Optimal for Large Language Models | May 20, 2025 | | CodeCode Available | 2 |
| Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning | May 20, 2025 | Domain GeneralizationMultimodal Reasoning | CodeCode Available | 2 |
| Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models | May 20, 2025 | Video CompressionVideo Understanding | CodeCode Available | 2 |
| Learning Spatio-Temporal Dynamics for Trajectory Recovery via Time-Aware Transformer | May 20, 2025 | Trajectory Recovery | CodeCode Available | 2 |
| CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation | May 20, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 2 |
| KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation | May 20, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 2 |
| UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens | May 20, 2025 | | CodeCode Available | 2 |
| PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks | May 20, 2025 | LLM JailbreakSafety Alignment | CodeCode Available | 2 |
| Temporal Query Network for Efficient Multivariate Time Series Forecasting | May 19, 2025 | Correlated Time Series ForecastingMultivariate Time Series Forecasting | CodeCode Available | 2 |
| Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space | May 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |