| s3: You Don't Need That Much Data to Train a Search Agent via RL | May 20, 2025 | RAGReinforcement Learning (RL) | CodeCode Available | 4 |
| Scaling Law for Quantization-Aware Training | May 20, 2025 | Quantization | CodeCode Available | 4 |
| VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation | May 20, 2025 | MMEMultiple-choice | CodeCode Available | 4 |
| DreamGen: Unlocking Generalization in Robot Learning through Video World Models | May 19, 2025 | Video Generation | CodeCode Available | 4 |
| Mean Flows for One-step Generative Modeling | May 19, 2025 | | CodeCode Available | 4 |
| MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision | May 19, 2025 | MathMathematical Reasoning | CodeCode Available | 4 |
| Multi-head Temporal Latent Attention | May 19, 2025 | GPUspeech-recognition | CodeCode Available | 4 |
| CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models | May 18, 2025 | Reinforcement Learning (RL) | CodeCode Available | 4 |
| Kornia-rs: A Low-Level 3D Computer Vision Library In Rust | May 18, 2025 | | CodeCode Available | 4 |
| VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning | May 17, 2025 | 2D Object DetectionObject Counting | CodeCode Available | 4 |
| Attention on the Sphere | May 16, 2025 | Depth EstimationImage Segmentation | CodeCode Available | 4 |
| Accelerating Visual-Policy Learning through Parallel Differentiable Simulation | May 15, 2025 | GPU | CodeCode Available | 4 |
| OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit | May 12, 2025 | GPUPrivacy Preserving | CodeCode Available | 4 |
| Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free | May 10, 2025 | AttributeMixture-of-Experts | CodeCode Available | 4 |
| Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models | May 8, 2025 | Multimodal Reasoning | CodeCode Available | 4 |
| FG-CLIP: Fine-Grained Visual and Textual Alignment | May 8, 2025 | Image-text Retrievalobject-detection | CodeCode Available | 4 |
| 3D Scene Generation: A Survey | May 8, 2025 | Autonomous DrivingDiversity | CodeCode Available | 4 |
| VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model | May 6, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 4 |
| Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning | May 6, 2025 | Image Generation | CodeCode Available | 4 |
| Towards One-shot Federated Learning: Advances, Challenges, and Future Directions | May 5, 2025 | Federated LearningSurvey | CodeCode Available | 4 |
| Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction | May 5, 2025 | Image Generationmultimodal interaction | CodeCode Available | 4 |
| Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality | May 5, 2025 | Retrieval | CodeCode Available | 4 |
| T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT | May 1, 2025 | Image GenerationReinforcement Learning (RL) | CodeCode Available | 4 |
| Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light | Apr 23, 2025 | | CodeCode Available | 4 |
| AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset | Apr 23, 2025 | MathMathematical Reasoning | CodeCode Available | 4 |
| High-performance training and inference for deep equivariant interatomic potentials | Apr 22, 2025 | Computational Efficiency | CodeCode Available | 4 |
| Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation | Apr 21, 2025 | Video Generation | CodeCode Available | 4 |
| Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models | Apr 21, 2025 | MMEVideo MME | CodeCode Available | 4 |
| RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild | Apr 21, 2025 | | CodeCode Available | 4 |
| Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Apr 15, 2025 | GPUInference Optimization | CodeCode Available | 4 |
| 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float | Apr 15, 2025 | CPUGPU | CodeCode Available | 4 |
| UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer | Apr 15, 2025 | Image Animation | CodeCode Available | 4 |
| Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models | Apr 15, 2025 | Humanoid ControlReinforcement Learning (RL) | CodeCode Available | 4 |
| Revisiting Self-Attentive Sequential Recommendation | Apr 13, 2025 | DecoderRecommendation Systems | CodeCode Available | 4 |
| LLMMapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources | Apr 8, 2025 | ArticlesForm | CodeCode Available | 4 |
| APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay | Apr 4, 2025 | | CodeCode Available | 4 |
| MedSAM2: Segment Anything in 3D Medical Images and Videos | Apr 4, 2025 | SegmentationVideo Segmentation | CodeCode Available | 4 |
| DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments | Apr 4, 2025 | NavigatePrompt Engineering | CodeCode Available | 4 |
| SkyReels-A2: Compose Anything in Video Diffusion Transformers | Apr 3, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 4 |
| Easi3R: Estimating Disentangled Motion from DUSt3R Without Training | Mar 31, 2025 | 4D reconstructionCamera Pose Estimation | CodeCode Available | 4 |
| OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model | Mar 30, 2025 | Autonomous DrivingDecision Making | CodeCode Available | 4 |
| ActionStudio: A Lightweight Framework for Data and Training of Large Action Models | Mar 28, 2025 | Diversity | CodeCode Available | 4 |
| Lumina-Image 2.0: A Unified and Efficient Image Generative Framework | Mar 27, 2025 | Image GenerationText to Image Generation | CodeCode Available | 4 |
| Video-R1: Reinforcing Video Reasoning in MLLMs | Mar 27, 2025 | MVBenchReinforcement Learning (RL) | CodeCode Available | 4 |
| X^2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction | Mar 27, 2025 | CT ReconstructionDecoder | CodeCode Available | 4 |
| Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages | Mar 26, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 4 |
| TerraTorch: The Geospatial Foundation Models Toolkit | Mar 26, 2025 | BenchmarkingDecoder | CodeCode Available | 4 |
| Your ViT is Secretly an Image Segmentation Model | Mar 24, 2025 | DecoderImage Segmentation | CodeCode Available | 4 |
| CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models | Mar 24, 2025 | | CodeCode Available | 4 |
| OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination | Mar 22, 2025 | | CodeCode Available | 4 |