| GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation | Nov 25, 2023 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets | Nov 25, 2023 | Image GenerationImage to Video Generation | CodeCode Available | 0 |
| AdaDiff: Adaptive Step Selection for Fast Diffusion Models | Nov 24, 2023 | DenoisingImage Generation | —Unverified | 0 |
| Decouple Content and Motion for Conditional Image-to-Video Generation | Nov 24, 2023 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning | Nov 21, 2023 | Image GenerationText-to-Video Generation | —Unverified | 0 |
| MoVideo: Motion-Aware Video Generation with Diffusion Models | Nov 19, 2023 | Image GenerationImage to Video Generation | —Unverified | 0 |
| Make Pixels Dance: High-Dynamic Video Generation | Nov 18, 2023 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning | Nov 17, 2023 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| LLM as an Art Director (LaDi): Using LLMs to improve Text-to-Media Generators | Nov 7, 2023 | Image GenerationRetrieval | —Unverified | 0 |
| MeVGAN: GAN-based Plugin Model for Video Generation with Applications in Colonoscopy | Nov 7, 2023 | Generative Adversarial NetworkMedical Procedure | CodeCode Available | 0 |
| REGIS: Refining Generated Videos via Iterative Stylistic Redesigning | Nov 3, 2023 | Text-to-Video GenerationVideo Generation | CodeCode Available | 0 |
| POS: A Prompts Optimization Suite for Augmenting Text-to-Video Generation | Nov 2, 2023 | DenoisingPOS | —Unverified | 0 |
| VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning | Nov 2, 2023 | AttributeText-to-Video Generation | —Unverified | 0 |
| Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation | Nov 2, 2023 | Video Generation | —Unverified | 0 |
| The Missing U for Efficient Diffusion Models | Oct 31, 2023 | DenoisingImage Generation | —Unverified | 0 |
| Echocardiography video synthesis from end diastolic semantic map via diffusion model | Oct 11, 2023 | DenoisingVideo Generation | —Unverified | 0 |
| RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches | Oct 2, 2023 | Deep LearningVideo Generation | —Unverified | 0 |
| LLM-grounded Video Diffusion Models | Sep 29, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions | Sep 28, 2023 | Talking Head GenerationVideo Generation | —Unverified | 0 |
| VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning | Sep 26, 2023 | Image GenerationVideo Generation | —Unverified | 0 |
| GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER | Sep 23, 2023 | DecoderVideo Generation | CodeCode Available | 0 |
| The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion | Sep 8, 2023 | Video Generation | —Unverified | 0 |
| RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model | Sep 2, 2023 | 3D GenerationImage Generation | —Unverified | 0 |
| VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation | Sep 1, 2023 | DecoderImage Generation | —Unverified | 0 |
| Explaining Vision and Language through Graphs of Events in Space and Time | Aug 29, 2023 | Graph MatchingVideo Generation | —Unverified | 0 |
| MagicAvatar: Multimodal Avatar Generation and Animation | Aug 28, 2023 | Video Generation | —Unverified | 0 |
| Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs | Aug 26, 2023 | In-Context LearningVideo Generation | —Unverified | 0 |
| APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency | Aug 24, 2023 | Video Generation | —Unverified | 0 |
| Hamiltonian GAN | Aug 22, 2023 | Inductive BiasVideo Generation | —Unverified | 0 |
| SimDA: Simple Diffusion Adapter for Efficient Video Generation | Aug 18, 2023 | Super-ResolutionTransfer Learning | —Unverified | 0 |
| Dual-Stream Diffusion Net for Text-to-Video Generation | Aug 16, 2023 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text | Jul 31, 2023 | Video Generation | —Unverified | 0 |
| Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline | Jul 19, 2023 | DecoderTalking Head Generation | —Unverified | 0 |
| InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation | Jul 13, 2023 | Action RecognitionContrastive Learning | —Unverified | 0 |
| GD-VDM: Generated Depth for better Diffusion-based Video Generation | Jun 19, 2023 | Image GenerationVideo Generation | CodeCode Available | 0 |
| Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis | Jun 6, 2023 | Neural Renderingtext-to-speech | —Unverified | 0 |
| Quantifying Sample Anonymity in Score-Based Generative Models with Adversarial Fingerprinting | Jun 2, 2023 | Anomaly DetectionData Augmentation | —Unverified | 0 |
| Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance | Jun 1, 2023 | Image GenerationVideo Generation | —Unverified | 0 |
| Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models | May 17, 2023 | Image GenerationText-to-Video Generation | —Unverified | 0 |
| Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation | May 16, 2023 | Motion GenerationMotion Synthesis | —Unverified | 0 |
| Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts | May 15, 2023 | DenoisingVideo Editing | —Unverified | 0 |
| Multi-object Video Generation from Single Frame Layouts | May 6, 2023 | Image GenerationObject | —Unverified | 0 |
| StyleLipSync: Style-based Personalized Lip-sync Video Generation | Apr 30, 2023 | Video Generation | —Unverified | 0 |
| LaMD: Latent Motion Diffusion for Image-Conditional Video Generation | Apr 23, 2023 | Motion GenerationVideo Generation | —Unverified | 0 |
| High-Fidelity and Freely Controllable Talking Head Video Generation | Apr 20, 2023 | Face ModelTalking Head Generation | —Unverified | 0 |
| Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation | Apr 17, 2023 | Image GenerationSuper-Resolution | —Unverified | 0 |
| Video Generation Beyond a Single Clip | Apr 15, 2023 | Video Generation | —Unverified | 0 |
| VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs | Apr 12, 2023 | Image AnimationVideo Editing | —Unverified | 0 |
| Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation | Mar 29, 2023 | Audio GenerationContrastive Learning | CodeCode Available | 0 |
| Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes | Mar 29, 2023 | Image GenerationVideo Generation | —Unverified | 0 |