| SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models | Nov 28, 2023 | Video Generation | CodeCode Available | 6 |
| MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation | Nov 28, 2023 | DisentanglementText-to-Video Generation | —Unverified | 0 |
| FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax | Nov 27, 2023 | Video Generation | —Unverified | 0 |
| GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation | Nov 25, 2023 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets | Nov 25, 2023 | Image GenerationImage to Video Generation | CodeCode Available | 0 |
| AdaDiff: Adaptive Step Selection for Fast Diffusion Models | Nov 24, 2023 | DenoisingImage Generation | —Unverified | 0 |
| Decouple Content and Motion for Conditional Image-to-Video Generation | Nov 24, 2023 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline | Nov 22, 2023 | SSIMText-to-Video Generation | CodeCode Available | 1 |
| AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance | Nov 21, 2023 | Image AnimationImage to Video Generation | CodeCode Available | 2 |
| GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning | Nov 21, 2023 | Image GenerationText-to-Video Generation | —Unverified | 0 |
| MoVideo: Motion-Aware Video Generation with Diffusion Models | Nov 19, 2023 | Image GenerationImage to Video Generation | —Unverified | 0 |
| MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion | Nov 18, 2023 | Video Generation | CodeCode Available | 3 |
| Make Pixels Dance: High-Dynamic Video Generation | Nov 18, 2023 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning | Nov 17, 2023 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| LLM as an Art Director (LaDi): Using LLMs to improve Text-to-Media Generators | Nov 7, 2023 | Image GenerationRetrieval | —Unverified | 0 |
| MeVGAN: GAN-based Plugin Model for Video Generation with Applications in Colonoscopy | Nov 7, 2023 | Generative Adversarial NetworkMedical Procedure | CodeCode Available | 0 |
| FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation | Nov 3, 2023 | Text-to-Video GenerationVideo Generation | CodeCode Available | 1 |
| REGIS: Refining Generated Videos via Iterative Stylistic Redesigning | Nov 3, 2023 | Text-to-Video GenerationVideo Generation | CodeCode Available | 0 |
| Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation | Nov 2, 2023 | Video Generation | —Unverified | 0 |
| VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning | Nov 2, 2023 | AttributeText-to-Video Generation | —Unverified | 0 |
| POS: A Prompts Optimization Suite for Augmenting Text-to-Video Generation | Nov 2, 2023 | DenoisingPOS | —Unverified | 0 |
| SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction | Oct 31, 2023 | PredictionSemantic Similarity | CodeCode Available | 2 |
| The Missing U for Efficient Diffusion Models | Oct 31, 2023 | DenoisingImage Generation | —Unverified | 0 |
| VideoCrafter1: Open Diffusion Models for High-Quality Video Generation | Oct 30, 2023 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 |
| CVPR 2023 Text Guided Video Editing Competition | Oct 24, 2023 | Video EditingVideo Generation | CodeCode Available | 1 |
| FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling | Oct 23, 2023 | Video Generation | CodeCode Available | 1 |
| EvalCrafter: Benchmarking and Evaluating Large Video Generation Models | Oct 17, 2023 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation | Oct 16, 2023 | GPUImage Animation | CodeCode Available | 2 |
| A Survey on Video Diffusion Models | Oct 16, 2023 | Image GenerationSurvey | CodeCode Available | 4 |
| DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model | Oct 11, 2023 | Autonomous DrivingImage Generation | CodeCode Available | 2 |
| Echocardiography video synthesis from end diastolic semantic map via diffusion model | Oct 11, 2023 | DenoisingVideo Generation | —Unverified | 0 |
| ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation | Oct 11, 2023 | Image GenerationText to Image Generation | CodeCode Available | 1 |
| Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation | Oct 9, 2023 | Action RecognitionImage Generation | CodeCode Available | 4 |
| RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches | Oct 2, 2023 | Deep LearningVideo Generation | —Unverified | 0 |
| LLM-grounded Video Diffusion Models | Sep 29, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions | Sep 28, 2023 | Talking Head GenerationVideo Generation | —Unverified | 0 |
| Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation | Sep 28, 2023 | Text-to-Video GenerationVideo Generation | CodeCode Available | 1 |
| Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation | Sep 27, 2023 | GPUText-to-Video Generation | CodeCode Available | 3 |
| LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models | Sep 26, 2023 | Super-ResolutionText-to-Video Generation | CodeCode Available | 1 |
| VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning | Sep 26, 2023 | Image GenerationVideo Generation | —Unverified | 0 |
| Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator | Sep 25, 2023 | Text-to-Video GenerationVideo Generation | CodeCode Available | 1 |
| GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER | Sep 23, 2023 | DecoderVideo Generation | CodeCode Available | 0 |
| FreeU: Free Lunch in Diffusion U-Net | Sep 20, 2023 | DecoderDenoising | CodeCode Available | 3 |
| DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving | Sep 18, 2023 | Autonomous DrivingVideo Generation | CodeCode Available | 2 |
| The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion | Sep 8, 2023 | Video Generation | —Unverified | 0 |
| Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation | Sep 7, 2023 | Action RecognitionDecoder | CodeCode Available | 1 |
| RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model | Sep 2, 2023 | 3D GenerationImage Generation | —Unverified | 0 |
| VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation | Sep 1, 2023 | DecoderImage Generation | —Unverified | 0 |
| StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation | Aug 31, 2023 | Style TransferUnconditional Video Generation | CodeCode Available | 1 |
| Explaining Vision and Language through Graphs of Events in Space and Time | Aug 29, 2023 | Graph MatchingVideo Generation | —Unverified | 0 |