| Minute-Long Videos with Dual Parallelisms | May 27, 2025 | DenoisingGPU | CodeCode Available | 1 |
| DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving | May 26, 2025 | Autonomous DrivingVideo Generation | CodeCode Available | 1 |
| VORTA: Efficient Video Diffusion via Routing Sparse Attention | May 24, 2025 | Video Generation | CodeCode Available | 1 |
| DVD-Quant: Data-free Video Diffusion Transformers Quantization | May 24, 2025 | Data Free QuantizationQuantization | CodeCode Available | 1 |
| CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation | May 21, 2025 | Video Generation | CodeCode Available | 1 |
| BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation | May 19, 2025 | Binary ClassificationDeepFake Detection | CodeCode Available | 1 |
| Video-GPT via Next Clip Diffusion | May 18, 2025 | DenoisingImage Animation | CodeCode Available | 1 |
| FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge | May 17, 2025 | Image GenerationScheduling | CodeCode Available | 1 |
| LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation | May 17, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model | May 12, 2025 | Video Generation | CodeCode Available | 1 |
| VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding | May 2, 2025 | Anomaly DetectionCommon Sense Reasoning | CodeCode Available | 1 |
| FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis | May 2, 2025 | Video Generation | CodeCode Available | 1 |
| Survey of Video Diffusion Models: Foundations, Implementations, and Applications | Apr 22, 2025 | Computational EfficiencyDenoising | CodeCode Available | 1 |
| VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate | Apr 16, 2025 | Video Generation | CodeCode Available | 1 |
| Diffusion Transformers for Tabular Data Time Series Generation | Apr 10, 2025 | Tabular Data GenerationTime Series | CodeCode Available | 1 |
| DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation | Apr 9, 2025 | Image GenerationText to Image Generation | CodeCode Available | 1 |
| CamContextI2V: Context-aware Controllable Video Generation | Apr 8, 2025 | DiversityScene Understanding | CodeCode Available | 1 |
| Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models | Apr 4, 2025 | DenoisingVideo Generation | CodeCode Available | 1 |
| VPO: Aligning Text-to-Video Generation Models with Prompt Optimization | Mar 26, 2025 | In-Context LearningSafety Alignment | CodeCode Available | 1 |
| EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models | Mar 25, 2025 | Video Generation | CodeCode Available | 1 |
| AMD-Hummingbird: Towards an Efficient Text-to-Video Model | Mar 24, 2025 | Computational EfficiencyVideo Generation | CodeCode Available | 1 |
| SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction | Mar 24, 2025 | Video GenerationVideo Prediction | CodeCode Available | 1 |
| MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving | Mar 20, 2025 | Autonomous DrivingDenoising | CodeCode Available | 1 |
| AIGVE-Tool: AI-Generated Video Evaluation Toolkit with Multifaceted Benchmark | Mar 18, 2025 | Video Generation | CodeCode Available | 1 |
| ^RFLAV: Rolling Flow matching for infinite Audio Video generation | Mar 11, 2025 | Video Generation | CodeCode Available | 1 |
| VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion | Mar 11, 2025 | Image MattingVideo Alignment | CodeCode Available | 1 |
| A Light and Tuning-free Method for Simulating Camera Motion in Video Generation | Mar 9, 2025 | DenoisingDepth Estimation | CodeCode Available | 1 |
| QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation | Mar 9, 2025 | QuantizationVideo Generation | CodeCode Available | 1 |
| DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation | Mar 8, 2025 | Video Generation | CodeCode Available | 1 |
| Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation | Mar 6, 2025 | DecoderGPU | CodeCode Available | 1 |
| The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation | Mar 6, 2025 | Semantic CompressionVideo Generation | CodeCode Available | 1 |
| Rethinking Video Tokenization: A Conditioned Diffusion-based Approach | Mar 5, 2025 | DecoderVideo Compression | CodeCode Available | 1 |
| DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance | Mar 5, 2025 | 3D Object DetectionBEV Segmentation | CodeCode Available | 1 |
| Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think | Mar 2, 2025 | DenoisingImage to Video Generation | CodeCode Available | 1 |
| C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation | Feb 27, 2025 | ObjectVideo Generation | CodeCode Available | 1 |
| VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation | Feb 18, 2025 | Text-to-Video GenerationVideo Captioning | CodeCode Available | 1 |
| Object-Centric Image to Video Generation with Language Guidance | Feb 17, 2025 | Image to Video GenerationObject | CodeCode Available | 1 |
| DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation | Feb 17, 2025 | Video Generation | CodeCode Available | 1 |
| Conditional diffusion model with spatial attention and latent embedding for medical image segmentation | Feb 10, 2025 | HippocampusImage Segmentation | CodeCode Available | 1 |
| Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency | Feb 6, 2025 | Video GenerationVideo Quality Assessment | CodeCode Available | 1 |
| VidSketch: Hand-drawn Sketch-Driven Video Generation with Diffusion Control | Feb 3, 2025 | Video Generation | CodeCode Available | 1 |
| VILP: Imitation Learning with Latent Video Planning | Feb 3, 2025 | Imitation LearningVideo Generation | CodeCode Available | 1 |
| Improved Training Technique for Latent Consistency Models | Feb 3, 2025 | Video Generation | CodeCode Available | 1 |
| Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search | Jan 31, 2025 | DenoisingVideo Alignment | CodeCode Available | 1 |
| CascadeV: An Implementation of Wurstchen Architecture for Video Generation | Jan 28, 2025 | 2kVideo Generation | CodeCode Available | 1 |
| EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion | Jan 23, 2025 | Video Generation | CodeCode Available | 1 |
| Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM | Dec 19, 2024 | Video Generation | CodeCode Available | 1 |
| Real-time One-Step Diffusion-based Expressive Portrait Videos Generation | Dec 18, 2024 | Video Generation | CodeCode Available | 1 |
| Video Diffusion Transformers are In-Context Learners | Dec 14, 2024 | Video Generation | CodeCode Available | 1 |
| ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer | Dec 10, 2024 | DenoisingImage Generation | CodeCode Available | 1 |