| VidToMe: Video Token Merging for Zero-Shot Video Editing | Dec 17, 2023 | Video EditingVideo Generation | CodeCode Available | 2 |
| FreeInit: Bridging Initialization Gap in Video Diffusion Models | Dec 12, 2023 | DenoisingText-to-Video Generation | CodeCode Available | 2 |
| Kandinsky 3.0 Technical Report | Dec 6, 2023 | Image GenerationImage to Video Generation | CodeCode Available | 2 |
| AnimateZero: Video Diffusion Models are Zero-Shot Image Animators | Dec 6, 2023 | Image AnimationVideo Generation | CodeCode Available | 2 |
| StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter | Dec 1, 2023 | DisentanglementText-to-Video Generation | CodeCode Available | 2 |
| TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models | Dec 1, 2023 | Image ClassificationMulti-Object Tracking | CodeCode Available | 2 |
| Panacea: Panoramic and Controllable Video Generation for Autonomous Driving | Nov 28, 2023 | Autonomous DrivingVideo Generation | CodeCode Available | 2 |
| AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance | Nov 21, 2023 | Image AnimationImage to Video Generation | CodeCode Available | 2 |
| SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction | Oct 31, 2023 | PredictionSemantic Similarity | CodeCode Available | 2 |
| LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation | Oct 16, 2023 | GPUImage Animation | CodeCode Available | 2 |
| DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model | Oct 11, 2023 | Autonomous DrivingImage Generation | CodeCode Available | 2 |
| DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving | Sep 18, 2023 | Autonomous DrivingVideo Generation | CodeCode Available | 2 |
| DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory | Aug 16, 2023 | Trajectory ModelingVideo Generation | CodeCode Available | 2 |
| Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation | Jul 19, 2023 | Talking Head GenerationVideo Generation | CodeCode Available | 2 |
| Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation | Jul 13, 2023 | RetrievalVideo Generation | CodeCode Available | 2 |
| Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising | May 29, 2023 | DenoisingImage Generation | CodeCode Available | 2 |
| Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning | May 23, 2023 | Image GenerationOptical Flow Estimation | CodeCode Available | 2 |
| ControlVideo: Training-free Controllable Text-to-Video Generation | May 22, 2023 | Image GenerationText-to-Video Generation | CodeCode Available | 2 |
| VDT: General-purpose Video Diffusion Transformers via Mask Modeling | May 22, 2023 | Autonomous DrivingVideo Generation | CodeCode Available | 2 |
| DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation | May 10, 2023 | 3D geometryGenerative Adversarial Network | CodeCode Available | 2 |
| StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video | May 1, 2023 | Face ReenactmentTranslation | CodeCode Available | 2 |
| Text2Performer: Text-Driven Human Video Generation | Apr 17, 2023 | Video Generation | CodeCode Available | 2 |
| CelebV-Text: A Large-Scale Facial Text-Video Dataset | Mar 26, 2023 | Text GenerationText-to-Video Generation | CodeCode Available | 2 |
| Conditional Image-to-Video Generation with Latent Flow Diffusion Models | Mar 24, 2023 | Image to Video GenerationMotion Generation | CodeCode Available | 2 |
| Blind Video Deflickering by Neural Filtering with a Flawed Atlas | Mar 14, 2023 | Video GenerationVideo Temporal Consistency | CodeCode Available | 2 |
| Video-P2P: Video Editing with Cross-attention Control | Mar 8, 2023 | Image GenerationVideo Editing | CodeCode Available | 2 |
| Video-P2P: Video Editing with Cross-attention Control | Mar 8, 2023 | Image GenerationVideo Editing | CodeCode Available | 2 |
| Video Probabilistic Diffusion Models in Projected Latent Space | Feb 15, 2023 | Video Generation | CodeCode Available | 2 |
| Generative Diffusion Models on Graphs: Methods and Applications | Feb 6, 2023 | DenoisingGraph Generation | CodeCode Available | 2 |
| Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models | Jan 30, 2023 | Audio GenerationText-to-Video Generation | CodeCode Available | 2 |
| MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation | Dec 19, 2022 | cross-modal alignmentDenoising | CodeCode Available | 2 |
| MAGVIT: Masked Generative Video Transformer | Dec 10, 2022 | Multi-Task LearningText-to-Video Generation | CodeCode Available | 2 |
| Latent Video Diffusion Models for High-Fidelity Long Video Generation | Nov 23, 2022 | DenoisingImage Generation | CodeCode Available | 2 |
| Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives | Nov 9, 2022 | DisentanglementVideo Generation | CodeCode Available | 2 |
| Temporally Consistent Transformers for Video Generation | Oct 5, 2022 | MinecraftVideo Generation | CodeCode Available | 2 |
| Phenaki: Variable Length Video Generation From Open Domain Textual Description | Oct 5, 2022 | DecoderVideo Generation | CodeCode Available | 2 |
| CelebV-HQ: A Large-Scale Video Facial Attributes Dataset | Jul 25, 2022 | AttributeDiversity | CodeCode Available | 2 |
| Collaborative Neural Rendering using Anime Character Sheets | Jul 12, 2022 | Image GenerationImage to 3D | CodeCode Available | 2 |
| Generating Long Videos of Dynamic Scenes | Jun 7, 2022 | MORPHVideo Generation | CodeCode Available | 2 |
| MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | May 19, 2022 | DenoisingPrediction | CodeCode Available | 2 |
| Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer | Apr 7, 2022 | Video Generation | CodeCode Available | 2 |
| Video Diffusion Models | Apr 7, 2022 | Unconditional Video GenerationVideo Generation | CodeCode Available | 2 |
| Depth-Aware Generative Adversarial Network for Talking Head Video Generation | Mar 13, 2022 | 3D geometryGenerative Adversarial Network | CodeCode Available | 2 |
| StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN | Mar 8, 2022 | Face GenerationFacial Editing | CodeCode Available | 2 |
| Generative Modeling of Weights: Generalization or Memorization? | Jun 9, 2025 | MemorizationVideo Generation | CodeCode Available | 1 |
| FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation | Jun 5, 2025 | DenoisingVideo Generation | CodeCode Available | 1 |
| SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios | Jun 3, 2025 | Motion GenerationVideo Generation | CodeCode Available | 1 |
| STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models | May 30, 2025 | Video Generation | CodeCode Available | 1 |
| VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation | May 29, 2025 | Caption GenerationLanguage Modeling | CodeCode Available | 1 |
| MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation | May 29, 2025 | Motion GenerationVideo Generation | CodeCode Available | 1 |