| DanceGRPO: Unleashing GRPO on Visual Generation | May 12, 2025 | Denoisingreinforcement-learning | CodeCode Available | 5 |
| UniVLA: Learning to Act Anywhere with Task-centric Latent Actions | May 9, 2025 | Robot ManipulationVision-Language-Action | CodeCode Available | 5 |
| Generating Physically Stable and Buildable LEGO Designs from Text | May 8, 2025 | 3D GenerationLarge Language Model | CodeCode Available | 5 |
| Continuous Thought Machines | May 8, 2025 | Computational EfficiencyQuestion Answering | CodeCode Available | 5 |
| ZeroSearch: Incentivize the Search Capability of LLMs without Searching | May 7, 2025 | Reinforcement Learning (RL)Retrieval | CodeCode Available | 5 |
| HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation | May 7, 2025 | Human-Domain Subject-to-VideoSingle-Domain Subject-to-Video | CodeCode Available | 5 |
| Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities | May 5, 2025 | Image GenerationSurvey | CodeCode Available | 5 |
| DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition | Apr 30, 2025 | Automated Theorem ProvingLarge Language Model | CodeCode Available | 5 |
| WebThinker: Empowering Large Reasoning Models with Deep Research Capability | Apr 30, 2025 | Navigate | CodeCode Available | 5 |
| Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers | Apr 27, 2025 | HallucinationQuestion Answering | CodeCode Available | 5 |
| Reservoir-enhanced Segment Anything Model for Subsurface Diagnosis | Apr 26, 2025 | Anomaly DetectionGPR | CodeCode Available | 5 |
| MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention | Apr 22, 2025 | GPU | CodeCode Available | 5 |
| InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework | Apr 16, 2025 | Image Generation | CodeCode Available | 5 |
| Reinforcement Learning from Human Feedback | Apr 16, 2025 | MathPhilosophy | CodeCode Available | 5 |
| Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding | Apr 14, 2025 | Question Answering | CodeCode Available | 5 |
| Kimi-VL Technical Report | Apr 10, 2025 | Long-Context UnderstandingMathematical Reasoning | CodeCode Available | 5 |
| M-Prometheus: A Suite of Open Multilingual LLM Judges | Apr 7, 2025 | Machine TranslationModel Selection | CodeCode Available | 5 |
| The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation | Apr 7, 2025 | Inference OptimizationReferring Video Object Segmentation | CodeCode Available | 5 |
| PaperBench: Evaluating AI's Ability to Replicate AI Research | Apr 2, 2025 | | CodeCode Available | 5 |
| Less-to-More Generalization: Unlocking More Controllability by In-Context Generation | Apr 2, 2025 | Conditional Image GenerationImage Generation | CodeCode Available | 5 |
| HDVIO2.0: Wind and Disturbance Estimation with Hybrid Dynamics VIO | Apr 1, 2025 | State Estimation | CodeCode Available | 5 |
| 4th PVUW MeViS 3rd Place Report: Sa2VA | Apr 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness | Mar 27, 2025 | Anomaly DetectionVideo Generation | CodeCode Available | 5 |
| Understanding R1-Zero-Like Training: A Critical Perspective | Mar 26, 2025 | Reinforcement Learning (RL) | CodeCode Available | 5 |
| ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning | Mar 25, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 5 |
| TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools | Mar 14, 2025 | AI AgentDecision Making | CodeCode Available | 5 |
| TikZero: Zero-Shot Text-Guided Graphics Program Synthesis | Mar 14, 2025 | Program Synthesis | CodeCode Available | 5 |
| Transformers without Normalization | Mar 13, 2025 | Self-Supervised Learning | CodeCode Available | 5 |
| FlowTok: Flowing Seamlessly Across Text and Image Tokens | Mar 13, 2025 | DenoisingImage to text | CodeCode Available | 5 |
| OminiControl2: Efficient Conditioning for Diffusion Transformers | Mar 11, 2025 | Conditional Image GenerationDenoising | CodeCode Available | 5 |
| Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models | Mar 9, 2025 | MathMultimodal Reasoning | CodeCode Available | 5 |
| R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning | Mar 7, 2025 | Emotion RecognitionLanguage Modeling | CodeCode Available | 5 |
| GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control | Mar 5, 2025 | Novel View SynthesisVideo Generation | CodeCode Available | 5 |
| InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation | Feb 28, 2025 | Audio GenerationForm | CodeCode Available | 5 |
| Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success | Feb 27, 2025 | Action GenerationChunking | CodeCode Available | 5 |
| Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Feb 27, 2025 | Computational EfficiencyGPU | CodeCode Available | 5 |
| UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler | Feb 27, 2025 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 5 |
| NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms | Feb 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Fractal Generative Models | Feb 24, 2025 | Image Generation | CodeCode Available | 5 |
| From System 1 to System 2: A Survey of Reasoning Large Language Models | Feb 24, 2025 | Logical Reasoning | CodeCode Available | 5 |
| Getting SMARTER for Motion Planning in Autonomous Driving Systems | Feb 20, 2025 | Autonomous DrivingMotion Planning | CodeCode Available | 5 |
| TrustRAG: An Information Assistant with Retrieval Augmented Generation | Feb 19, 2025 | Answer GenerationChunking | CodeCode Available | 5 |
| Magma: A Foundation Model for Multimodal AI Agents | Feb 18, 2025 | Autonomous Web NavigationImage to text | CodeCode Available | 5 |
| AIDE: AI-Driven Exploration in the Space of Code | Feb 18, 2025 | | CodeCode Available | 5 |
| SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? | Feb 17, 2025 | | CodeCode Available | 5 |
| On the Computation of the Fisher Information in Continual Learning | Feb 17, 2025 | Continual Learning | CodeCode Available | 5 |
| Time-series attribution maps with regularized contrastive learning | Feb 17, 2025 | Contrastive LearningTime Series | CodeCode Available | 5 |
| Phantom: Subject-consistent video generation via cross-modal alignment | Feb 16, 2025 | cross-modal alignmentHuman-Domain Subject-to-Video | CodeCode Available | 5 |
| The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey | Feb 14, 2025 | Autonomous DrivingSurvey | CodeCode Available | 5 |
| HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation | Feb 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |