| Navigation World Models | Dec 4, 2024 | Robot NavigationVideo Generation | CodeCode Available | 4 |
| Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models | Feb 27, 2024 | MarketingVideo Generation | CodeCode Available | 4 |
| MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion | Nov 18, 2023 | Video Generation | CodeCode Available | 3 |
| Exploring the Evolution of Physics Cognition in Video Generation: A Survey | Mar 27, 2025 | Video Generation | CodeCode Available | 3 |
| MagCache: Fast Video Generation with Magnitude-Aware Cache | Jun 10, 2025 | SSIMVideo Generation | CodeCode Available | 3 |
| Magic-Me: Identity-Specific Video Customized Diffusion | Feb 14, 2024 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| Epona: Autoregressive Diffusion World Model for Autonomous Driving | Jun 30, 2025 | Autonomous Drivingmodel | CodeCode Available | 3 |
| Lumiere: A Space-Time Diffusion Model for Video Generation | Jan 23, 2024 | Super-ResolutionText-to-Video Generation | CodeCode Available | 3 |
| Automated Movie Generation via Multi-Agent CoT Planning | Mar 10, 2025 | Video Generation | CodeCode Available | 3 |
| AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation | Jun 12, 2025 | Video Generation | CodeCode Available | 3 |
| Long-Context Autoregressive Video Modeling with Next-Frame Prediction | Mar 25, 2025 | Text GenerationVideo Generation | CodeCode Available | 3 |
| A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights | Jul 11, 2024 | Motion GenerationSurvey | CodeCode Available | 3 |
| Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Apr 3, 2025 | MambaTalking Head Generation | CodeCode Available | 3 |
| Training-Free Efficient Video Generation via Dynamic Token Carving | May 22, 2025 | DenoisingVideo Generation | CodeCode Available | 3 |
| T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback | May 29, 2024 | Video Generation | CodeCode Available | 3 |
| JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing | Jan 3, 2025 | 3D ReconstructionFace Generation | CodeCode Available | 3 |
| Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model | Mar 14, 2025 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design | Oct 8, 2024 | Video AlignmentVideo Generation | CodeCode Available | 3 |
| ID-Animator: Zero-Shot Identity-Preserving Human Video Generation | Apr 23, 2024 | AttributeVideo Generation | CodeCode Available | 3 |
| DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT | Dec 27, 2024 | Autonomous DrivingVideo Generation | CodeCode Available | 3 |
| Evaluating Text-to-Visual Generation with Image-to-Text Generation | Apr 1, 2024 | Image to textQuestion Answering | CodeCode Available | 3 |
| Evaluation of Text-to-Video Generation Models: A Dynamics Perspective | Jul 1, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation | Sep 27, 2023 | GPUText-to-Video Generation | CodeCode Available | 3 |
| CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation | Jan 20, 2025 | Video GenerationVirtual Try-on | CodeCode Available | 3 |
| Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Sep 11, 2024 | 3D Generation3D Reconstruction | CodeCode Available | 3 |
| DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes | Sep 6, 2024 | Video Generation | CodeCode Available | 3 |
| History-Guided Video Diffusion | Feb 10, 2025 | Video Generation | CodeCode Available | 3 |
| Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances | Oct 24, 2024 | BenchmarkingImage to Video Generation | CodeCode Available | 3 |
| Do generative video models understand physical principles? | Jan 14, 2025 | Video Generation | CodeCode Available | 3 |
| REDUCIO! Generating 10241024 Video within 16 Seconds using Extremely Compressed Motion Latents | Nov 20, 2024 | GPUVideo Generation | CodeCode Available | 3 |
| HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Jul 24, 2024 | BenchmarkingHuman Animation | CodeCode Available | 3 |
| PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation | Sep 27, 2024 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation | Dec 24, 2024 | Video EditingVideo Generation | CodeCode Available | 3 |
| Accelerating Diffusion Transformers with Dual Feature Caching | Dec 25, 2024 | Video Generation | CodeCode Available | 3 |
| Generative AI for Autonomous Driving: Frontiers and Opportunities | May 13, 2025 | Autonomous DrivingVideo Generation | CodeCode Available | 3 |
| Accelerating Diffusion Transformers with Token-wise Feature Caching | Oct 5, 2024 | Video Generation | CodeCode Available | 3 |
| Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task | Sep 6, 2024 | Video Generation | CodeCode Available | 3 |
| On the Content Bias in Fréchet Video Distance | Apr 18, 2024 | Video Generation | CodeCode Available | 3 |
| FreeU: Free Lunch in Diffusion U-Net | Sep 20, 2023 | DecoderDenoising | CodeCode Available | 3 |
| From Sora What We Can See: A Survey of Text-to-Video Generation | May 17, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation | Jun 13, 2024 | Video GenerationVideo Prediction | CodeCode Available | 3 |
| FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors | Jan 14, 2025 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation | May 15, 2025 | Image AnimationVideo Generation | CodeCode Available | 3 |
| GameGen-X: Interactive Open-world Game Video Generation | Nov 1, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views | Nov 7, 2024 | 3DGS3D Reconstruction | CodeCode Available | 3 |
| Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | Aug 19, 2024 | Image GenerationVideo Generation | CodeCode Available | 3 |
| Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | Jun 10, 2025 | 3D Lane Detection3D Object Detection | CodeCode Available | 3 |
| ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation | Feb 6, 2024 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation | Oct 17, 2024 | Talking Head GenerationVideo Generation | CodeCode Available | 3 |
| FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation | Feb 7, 2025 | Computational EfficiencyText-to-Video Generation | CodeCode Available | 3 |