| MagicAvatar: Multimodal Avatar Generation and Animation | Aug 28, 2023 | Video Generation | —Unverified | 0 |
| Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs | Aug 26, 2023 | In-Context LearningVideo Generation | —Unverified | 0 |
| APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency | Aug 24, 2023 | Video Generation | —Unverified | 0 |
| StoryBench: A Multifaceted Benchmark for Continuous Story Visualization | Aug 22, 2023 | Story ContinuationStory Generation | CodeCode Available | 1 |
| Hamiltonian GAN | Aug 22, 2023 | Inductive BiasVideo Generation | —Unverified | 0 |
| SimDA: Simple Diffusion Adapter for Efficient Video Generation | Aug 18, 2023 | Super-ResolutionTransfer Learning | —Unverified | 0 |
| DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory | Aug 16, 2023 | Trajectory ModelingVideo Generation | CodeCode Available | 2 |
| Dual-Stream Diffusion Net for Text-to-Video Generation | Aug 16, 2023 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text | Jul 31, 2023 | Video Generation | —Unverified | 0 |
| Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline | Jul 19, 2023 | DecoderTalking Head Generation | —Unverified | 0 |
| Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation | Jul 19, 2023 | Talking Head GenerationVideo Generation | CodeCode Available | 2 |
| Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer | Jul 15, 2023 | motion predictionPose Transfer | CodeCode Available | 1 |
| InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation | Jul 13, 2023 | Action RecognitionContrastive Learning | —Unverified | 0 |
| Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation | Jul 13, 2023 | RetrievalVideo Generation | CodeCode Available | 2 |
| GD-VDM: Generated Depth for better Diffusion-based Video Generation | Jun 19, 2023 | Image GenerationVideo Generation | CodeCode Available | 0 |
| DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles | Jun 9, 2023 | ObjectPosition | CodeCode Available | 1 |
| Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis | Jun 6, 2023 | Neural Renderingtext-to-speech | —Unverified | 0 |
| Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation | Jun 6, 2023 | ObjectVideo Generation | CodeCode Available | 1 |
| Video Diffusion Models with Local-Global Context Guidance | Jun 5, 2023 | Future predictionPrediction | CodeCode Available | 1 |
| Quantifying Sample Anonymity in Score-Based Generative Models with Adversarial Fingerprinting | Jun 2, 2023 | Anomaly DetectionData Augmentation | —Unverified | 0 |
| Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance | Jun 1, 2023 | Image GenerationVideo Generation | —Unverified | 0 |
| Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising | May 29, 2023 | DenoisingImage Generation | CodeCode Available | 2 |
| Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning | May 23, 2023 | Image GenerationOptical Flow Estimation | CodeCode Available | 2 |
| DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation | May 23, 2023 | Text-to-Video GenerationVideo Generation | CodeCode Available | 1 |
| VDT: General-purpose Video Diffusion Transformers via Mask Modeling | May 22, 2023 | Autonomous DrivingVideo Generation | CodeCode Available | 2 |
| ControlVideo: Training-free Controllable Text-to-Video Generation | May 22, 2023 | Image GenerationText-to-Video Generation | CodeCode Available | 2 |
| Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation | May 18, 2023 | Image GenerationText to Image Generation | CodeCode Available | 1 |
| Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models | May 17, 2023 | Image GenerationText-to-Video Generation | —Unverified | 0 |
| Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation | May 16, 2023 | Motion GenerationMotion Synthesis | —Unverified | 0 |
| Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts | May 15, 2023 | DenoisingVideo Editing | —Unverified | 0 |
| Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models | May 10, 2023 | Text-to-Video GenerationVideo Generation | CodeCode Available | 1 |
| DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation | May 10, 2023 | 3D geometryGenerative Adversarial Network | CodeCode Available | 2 |
| Multi-object Video Generation from Single Frame Layouts | May 6, 2023 | Image GenerationObject | —Unverified | 0 |
| StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video | May 1, 2023 | Face ReenactmentTranslation | CodeCode Available | 2 |
| StyleLipSync: Style-based Personalized Lip-sync Video Generation | Apr 30, 2023 | Video Generation | —Unverified | 0 |
| LaMD: Latent Motion Diffusion for Image-Conditional Video Generation | Apr 23, 2023 | Motion GenerationVideo Generation | —Unverified | 0 |
| High-Fidelity and Freely Controllable Talking Head Video Generation | Apr 20, 2023 | Face ModelTalking Head Generation | —Unverified | 0 |
| Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models | Apr 18, 2023 | Image GenerationSuper-Resolution | CodeCode Available | 1 |
| Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation | Apr 17, 2023 | Image GenerationSuper-Resolution | —Unverified | 0 |
| Generative Disco: Text-to-Video Generation for Music Visualization | Apr 17, 2023 | Text-to-Video GenerationVideo Generation | CodeCode Available | 1 |
| Text2Performer: Text-Driven Human Video Generation | Apr 17, 2023 | Video Generation | CodeCode Available | 2 |
| Video Generation Beyond a Single Clip | Apr 15, 2023 | Video Generation | —Unverified | 0 |
| VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs | Apr 12, 2023 | Image AnimationVideo Editing | —Unverified | 0 |
| Mask-conditioned latent diffusion for generating gastrointestinal polyp images | Apr 11, 2023 | Image GenerationImage Segmentation | CodeCode Available | 1 |
| Generative Recommendation: Towards Next-generation Recommender Paradigm | Apr 7, 2023 | Recommendation SystemsRetrieval | CodeCode Available | 1 |
| MoStGAN-V: Video Generation with Temporal Motion Styles | Apr 5, 2023 | Video Generation | CodeCode Available | 1 |
| Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos | Apr 3, 2023 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes | Mar 29, 2023 | Image GenerationVideo Generation | —Unverified | 0 |
| Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation | Mar 29, 2023 | Audio GenerationContrastive Learning | CodeCode Available | 0 |
| Fine-grained Audible Video Description | Mar 27, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |