| SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation | Nov 7, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion | Nov 7, 2024 | 3D GenerationDenoising | —Unverified | 0 |
| TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation | Nov 5, 2024 | Image to Video GenerationMisinformation | —Unverified | 0 |
| Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey | Nov 5, 2024 | 3D Scene ReconstructionAutonomous Driving | —Unverified | 0 |
| Adaptive Caching for Faster Video Generation with Diffusion Transformers | Nov 4, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| How Far is Video Generation from World Model: A Physical Law Perspective | Nov 4, 2024 | Video Generation | —Unverified | 0 |
| Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation | Nov 3, 2024 | MambaOptical Flow Estimation | —Unverified | 0 |
| Fast and Memory-Efficient Video Diffusion Using Streamlined Inference | Nov 2, 2024 | GPUVideo Generation | CodeCode Available | 1 |
| GameGen-X: Interactive Open-world Game Video Generation | Nov 1, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| Fashion-VDM: Video Diffusion Model for Virtual Try-On | Oct 31, 2024 | Video GenerationVirtual Try-on | —Unverified | 0 |
| Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts | Oct 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning | Oct 31, 2024 | Motion SynthesisText-to-Video Generation | CodeCode Available | 1 |
| LumiSculpt: A Consistency Lighting Control Network for Video Generation | Oct 30, 2024 | Video Generation | —Unverified | 0 |
| SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation | Oct 30, 2024 | Video Generation | —Unverified | 0 |
| HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models | Oct 30, 2024 | Video Generation | CodeCode Available | 4 |
| Investigating Memorization in Video Diffusion Models | Oct 29, 2024 | MemorizationVideo Generation | —Unverified | 0 |
| LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | Oct 28, 2024 | Video GenerationVideo Reconstruction | CodeCode Available | 2 |
| ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation | Oct 27, 2024 | Video Generation | —Unverified | 0 |
| GiVE: Guiding Visual Encoder to Perceive Overlooked Information | Oct 26, 2024 | ObjectQuestion Answering | —Unverified | 0 |
| MarDini: Masked Autoregressive Diffusion for Video Generation at Scale | Oct 26, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality | Oct 25, 2024 | Video Generation | —Unverified | 0 |
| Framer: Interactive Frame Interpolation | Oct 24, 2024 | Image MorphingVideo Generation | —Unverified | 0 |
| Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances | Oct 24, 2024 | BenchmarkingImage to Video Generation | CodeCode Available | 3 |
| WorldSimBench: Towards Video Generation Models as World Simulators | Oct 23, 2024 | Autonomous DrivingRobot Manipulation | —Unverified | 0 |
| VISAGE: Video Synthesis using Action Graphs for Surgery | Oct 23, 2024 | Video Generation | —Unverified | 0 |
| 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors | Oct 21, 2024 | 3DGSDecoder | —Unverified | 0 |
| FrameBridge: Improving Image-to-Video Generation with Bridge Models | Oct 20, 2024 | Image AnimationImage to Video Generation | —Unverified | 0 |
| Allegro: Open the Black Box of Commercial-Level Video Generation Model | Oct 20, 2024 | Video Generation | CodeCode Available | 5 |
| EVA: An Embodied World Model for Future Video Anticipation | Oct 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation | Oct 17, 2024 | Talking Head GenerationVideo Generation | CodeCode Available | 3 |
| Fundus to Fluorescein Angiography Video Generation as a Retinal Generative Foundation Model | Oct 17, 2024 | Disease PredictionGenerative Adversarial Network | —Unverified | 0 |
| AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations | Oct 17, 2024 | DecoderQuantization | —Unverified | 0 |
| DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation | Oct 17, 2024 | 3DGS4D reconstruction | —Unverified | 0 |
| DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control | Oct 17, 2024 | Video Generation | —Unverified | 0 |
| VidPanos: Generative Panoramic Videos from Casual Panning Videos | Oct 17, 2024 | Image StitchingVideo Generation | —Unverified | 0 |
| Movie Gen: A Cast of Media Foundation Models | Oct 17, 2024 | Audio GenerationVideo Editing | CodeCode Available | 3 |
| SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation | Oct 16, 2024 | DenoisingVideo Generation | CodeCode Available | 2 |
| Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices | Oct 15, 2024 | Image Generationmultimodal generation | CodeCode Available | 1 |
| LVD-2M: A Long-take Video Dataset with Temporally Dense Captions | Oct 14, 2024 | Video CaptioningVideo Generation | CodeCode Available | 2 |
| MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling | Oct 14, 2024 | Audio-Visual SynchronizationGPU | CodeCode Available | 9 |
| Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention | Oct 14, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships | Oct 14, 2024 | Video Generation | —Unverified | 0 |
| Boosting Camera Motion Control for Video Diffusion Transformers | Oct 14, 2024 | Video Generation | —Unverified | 0 |
| Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models | Oct 14, 2024 | 3D geometryDenoising | CodeCode Available | 2 |
| VideoAgent: Self-Improving Video Generation | Oct 14, 2024 | HallucinationVideo Generation | CodeCode Available | 2 |
| Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities | Oct 11, 2024 | DenoisingImage Quality Assessment | —Unverified | 0 |
| Animating the Past: Reconstruct Trilobite via Video Generation | Oct 10, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content | Oct 10, 2024 | Video AlignmentVideo Generation | —Unverified | 0 |
| Progressive Autoregressive Video Diffusion Models | Oct 10, 2024 | DenoisingVideo Denoising | CodeCode Available | 2 |
| MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion | Oct 10, 2024 | Denoisingparameter-efficient fine-tuning | CodeCode Available | 0 |