| Diffusion Models: A Comprehensive Survey of Methods and Applications | Sep 2, 2022 | Image GenerationImage Super-Resolution | CodeCode Available | 4 |
| NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis | Jul 20, 2022 | Image OutpaintingText-to-Image Generation | CodeCode Available | 4 |
| Epona: Autoregressive Diffusion World Model for Autonomous Driving | Jun 30, 2025 | Autonomous Drivingmodel | CodeCode Available | 3 |
| AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation | Jun 12, 2025 | Video Generation | CodeCode Available | 3 |
| MagCache: Fast Video Generation with Magnitude-Aware Cache | Jun 10, 2025 | SSIMVideo Generation | CodeCode Available | 3 |
| Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | Jun 10, 2025 | 3D Lane Detection3D Object Detection | CodeCode Available | 3 |
| MAGREF: Masked Guidance for Any-Reference Video Generation | May 29, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 3 |
| Training-Free Efficient Video Generation via Dynamic Token Carving | May 22, 2025 | DenoisingVideo Generation | CodeCode Available | 3 |
| MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation | May 15, 2025 | Image AnimationVideo Generation | CodeCode Available | 3 |
| Generative AI for Autonomous Driving: Frontiers and Opportunities | May 13, 2025 | Autonomous DrivingVideo Generation | CodeCode Available | 3 |
| Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization | Apr 5, 2025 | 3D GenerationVideo Alignment | CodeCode Available | 3 |
| Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Apr 3, 2025 | MambaTalking Head Generation | CodeCode Available | 3 |
| VideoGen-Eval: Agent-based System for Video Generation Evaluation | Mar 30, 2025 | DiversityVideo Generation | CodeCode Available | 3 |
| Exploring the Evolution of Physics Cognition in Video Generation: A Survey | Mar 27, 2025 | Video Generation | CodeCode Available | 3 |
| Long-Context Autoregressive Video Modeling with Next-Frame Prediction | Mar 25, 2025 | Text GenerationVideo Generation | CodeCode Available | 3 |
| XAttention: Block Sparse Attention with Antidiagonal Scoring | Mar 20, 2025 | Video GenerationVideo Understanding | CodeCode Available | 3 |
| Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model | Mar 14, 2025 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| Automated Movie Generation via Multi-Agent CoT Planning | Mar 10, 2025 | Video Generation | CodeCode Available | 3 |
| MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio | Mar 7, 2025 | Video Generation | CodeCode Available | 3 |
| History-Guided Video Diffusion | Feb 10, 2025 | Video Generation | CodeCode Available | 3 |
| FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation | Feb 7, 2025 | Computational EfficiencyText-to-Video Generation | CodeCode Available | 3 |
| CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation | Jan 20, 2025 | Video GenerationVirtual Try-on | CodeCode Available | 3 |
| FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors | Jan 14, 2025 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| Do generative video models understand physical principles? | Jan 14, 2025 | Video Generation | CodeCode Available | 3 |
| JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing | Jan 3, 2025 | 3D ReconstructionFace Generation | CodeCode Available | 3 |
| VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation | Dec 30, 2024 | Video GenerationVideo Quality Assessment | CodeCode Available | 3 |
| DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT | Dec 27, 2024 | Autonomous DrivingVideo Generation | CodeCode Available | 3 |
| Accelerating Diffusion Transformers with Dual Feature Caching | Dec 25, 2024 | Video Generation | CodeCode Available | 3 |
| DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation | Dec 24, 2024 | Video EditingVideo Generation | CodeCode Available | 3 |
| VidTwin: Video VAE with Decoupled Structure and Dynamics | Dec 23, 2024 | DecoderVideo Generation | CodeCode Available | 3 |
| Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations | Dec 19, 2024 | Contrastive LearningImage Reconstruction | CodeCode Available | 3 |
| VidTok: A Versatile and Open-Source Video Tokenizer | Dec 17, 2024 | QuantizationSSIM | CodeCode Available | 3 |
| UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving | Dec 6, 2024 | Autonomous DrivingDiversity | CodeCode Available | 3 |
| REDUCIO! Generating 10241024 Video within 16 Seconds using Extremely Compressed Motion Latents | Nov 20, 2024 | GPUVideo Generation | CodeCode Available | 3 |
| MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views | Nov 7, 2024 | 3DGS3D Reconstruction | CodeCode Available | 3 |
| GameGen-X: Interactive Open-world Game Video Generation | Nov 1, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances | Oct 24, 2024 | BenchmarkingImage to Video Generation | CodeCode Available | 3 |
| DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation | Oct 17, 2024 | Talking Head GenerationVideo Generation | CodeCode Available | 3 |
| Movie Gen: A Cast of Media Foundation Models | Oct 17, 2024 | Audio GenerationVideo Editing | CodeCode Available | 3 |
| T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design | Oct 8, 2024 | Video AlignmentVideo Generation | CodeCode Available | 3 |
| Accelerating Diffusion Transformers with Token-wise Feature Caching | Oct 5, 2024 | Video Generation | CodeCode Available | 3 |
| PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation | Sep 27, 2024 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Sep 11, 2024 | 3D Generation3D Reconstruction | CodeCode Available | 3 |
| DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes | Sep 6, 2024 | Video Generation | CodeCode Available | 3 |
| Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task | Sep 6, 2024 | Video Generation | CodeCode Available | 3 |
| Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | Aug 19, 2024 | Image GenerationVideo Generation | CodeCode Available | 3 |
| Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving | Aug 14, 2024 | 3D Object Detection3D Object Tracking | CodeCode Available | 3 |
| HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Jul 24, 2024 | BenchmarkingHuman Animation | CodeCode Available | 3 |
| A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights | Jul 11, 2024 | Motion GenerationSurvey | CodeCode Available | 3 |
| Evaluation of Text-to-Video Generation Models: A Dynamics Perspective | Jul 1, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |