| SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation | Sep 10, 2024 | Video Generation | CodeCode Available | 2 |
| MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control | Sep 10, 2024 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer | Sep 10, 2024 | 3D GenerationVideo Generation | —Unverified | 0 |
| DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation | Sep 9, 2024 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes | Sep 6, 2024 | Video Generation | CodeCode Available | 3 |
| Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task | Sep 6, 2024 | Video Generation | CodeCode Available | 3 |
| Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency | Sep 4, 2024 | Video Generation | —Unverified | 0 |
| DiVE: DiT-based Video Generation with Enhanced Control | Sep 3, 2024 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos | Sep 3, 2024 | Depth EstimationDiversity | CodeCode Available | 5 |
| CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention | Sep 3, 2024 | Human AnimationVideo Generation | —Unverified | 0 |
| OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model | Sep 2, 2024 | GPUVideo Generation | —Unverified | 0 |
| AMG: Avatar Motion Guided Video Generation | Sep 2, 2024 | Video Generation | CodeCode Available | 1 |
| Compositional 3D-aware Video Generation with LLM Director | Aug 31, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers | Aug 30, 2024 | GPUImage Generation | —Unverified | 0 |
| One-Shot Learning Meets Depth Diffusion in Multi-Object Videos | Aug 29, 2024 | One-Shot LearningVideo Generation | —Unverified | 0 |
| Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation | Aug 29, 2024 | AllVideo Generation | —Unverified | 0 |
| DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving | Aug 29, 2024 | Autonomous DrivingDenoising | —Unverified | 0 |
| GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model | Aug 28, 2024 | Autonomous DrivingData Augmentation | —Unverified | 0 |
| GenRec: Unifying Video Generation and Recognition with Diffusion Models | Aug 27, 2024 | Image to Video GenerationVideo Generation | CodeCode Available | 0 |
| Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance | Aug 27, 2024 | Clinical KnowledgeLesion Segmentation | CodeCode Available | 0 |
| SurGen: Text-Guided Diffusion Model for Surgical Video Generation | Aug 26, 2024 | Video Generation | —Unverified | 0 |
| Decoupled Video Generation with Chain of Training-free Diffusion Model Experts | Aug 24, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| TVG: A Training-free Transition Video Generation Method with Diffusion Models | Aug 24, 2024 | GPRVideo Generation | —Unverified | 0 |
| EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation | Aug 23, 2024 | Image GenerationVideo Generation | —Unverified | 0 |
| CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities | Aug 23, 2024 | DenoisingMotion Generation | CodeCode Available | 2 |
| Real-Time Video Generation with Pyramid Attention Broadcast | Aug 22, 2024 | Video Generation | CodeCode Available | 7 |
| xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations | Aug 22, 2024 | Dense CaptioningMotion Estimation | —Unverified | 0 |
| TrackGo: A Flexible and Efficient Method for Controllable Video Generation | Aug 21, 2024 | Video Generation | —Unverified | 0 |
| DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | Aug 21, 2024 | Video Generation | —Unverified | 0 |
| Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation | Aug 19, 2024 | Instruction FollowingLarge Language Model | —Unverified | 0 |
| Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | Aug 19, 2024 | Image GenerationVideo Generation | CodeCode Available | 3 |
| Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data | Aug 19, 2024 | DescriptiveImage to Video Generation | CodeCode Available | 0 |
| JPEG-LM: LLMs as Image Generators with Canonical Codec Representations | Aug 15, 2024 | Image GenerationQuantization | —Unverified | 0 |
| FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance | Aug 15, 2024 | TARVideo Generation | CodeCode Available | 4 |
| When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding | Aug 15, 2024 | Video CompressionVideo Generation | —Unverified | 0 |
| Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving | Aug 14, 2024 | 3D Object Detection3D Object Tracking | CodeCode Available | 3 |
| CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer | Aug 12, 2024 | Text-to-Video GenerationVideo Alignment | CodeCode Available | 11 |
| ControlNeXt: Powerful and Efficient Control for Image and Video Generation | Aug 12, 2024 | Video Generation | CodeCode Available | 5 |
| Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE | Aug 10, 2024 | Scene GenerationVideo Generation | —Unverified | 0 |
| High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model | Aug 10, 2024 | Face GenerationTalking Face Generation | —Unverified | 0 |
| Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics | Aug 8, 2024 | Video Generation | —Unverified | 0 |
| VidGen-1M: A Large-Scale Dataset for Text-to-video Generation | Aug 5, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion | Aug 1, 2024 | Face ReenactmentVideo Generation | —Unverified | 0 |
| Segment Anything for Videos: A Systematic Survey | Jul 31, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 5 |
| Tora: Trajectory-oriented Diffusion Transformer for Video Generation | Jul 31, 2024 | Video CompressionVideo Generation | CodeCode Available | 5 |
| Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model | Jul 31, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 |
| Fine-gained Zero-shot Video Sampling | Jul 31, 2024 | Image GenerationVideo Editing | —Unverified | 0 |
| Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation | Jul 31, 2024 | PositionVideo Generation | CodeCode Available | 0 |
| FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention | Jul 29, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models | Jul 28, 2024 | DenoisingVideo Generation | CodeCode Available | 0 |