| UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control | Mar 4, 2024 | DiversityVideo Generation | CodeCode Available | 2 | 5 |
| AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion | Mar 10, 2025 | Video Generation | CodeCode Available | 2 | 5 |
| SF-V: Single Forward Video Generation Model | Jun 6, 2024 | Denoisingmodel | CodeCode Available | 2 | 5 |
| Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation | Jul 3, 2025 | DiversityVideo Generation | CodeCode Available | 2 | 5 |
| AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance | Nov 21, 2023 | Image AnimationImage to Video Generation | CodeCode Available | 2 | 5 |
| SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction | Oct 31, 2023 | PredictionSemantic Similarity | CodeCode Available | 2 | 5 |
| Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation | Feb 16, 2024 | Video Generation | CodeCode Available | 2 | 5 |
| Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Dec 5, 2024 | Image ComprehensionRepresentation Learning | CodeCode Available | 2 | 5 |
| Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing | Nov 25, 2024 | DenoisingVideo Generation | CodeCode Available | 2 | 5 |
| SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation | Oct 16, 2024 | DenoisingVideo Generation | CodeCode Available | 2 | 5 |
| SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation | Sep 10, 2024 | Video Generation | CodeCode Available | 2 | 5 |
| RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements | Apr 11, 2025 | Video Generation | CodeCode Available | 2 | 5 |
| Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers | Jun 25, 2024 | Image GenerationModel Compression | CodeCode Available | 2 | 5 |
| Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models | Dec 10, 2024 | Video Generation | CodeCode Available | 2 | 5 |
| MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | May 19, 2022 | DenoisingPrediction | CodeCode Available | 2 | 5 |
| SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation | Apr 19, 2025 | ERPVideo Generation | CodeCode Available | 2 | 5 |
| PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop | Mar 12, 2025 | DiagnosticVideo Generation | CodeCode Available | 2 | 5 |
| PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation | Nov 30, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 2 | 5 |
| PresentAgent: Multimodal Agent for Presentation Video Generation | Jul 5, 2025 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| Phenaki: Variable Length Video Generation From Open Domain Textual Description | Oct 5, 2022 | DecoderVideo Generation | CodeCode Available | 2 | 5 |
| Progressive Autoregressive Video Diffusion Models | Oct 10, 2024 | DenoisingVideo Denoising | CodeCode Available | 2 | 5 |
| Owl-1: Omni World Model for Consistent Long Video Generation | Dec 12, 2024 | Video Generation | CodeCode Available | 2 | 5 |
| ORV: 4D Occupancy-centric Robot Video Generation | Jun 3, 2025 | Video Generation | CodeCode Available | 2 | 5 |
| Panacea: Panoramic and Controllable Video Generation for Autonomous Driving | Nov 28, 2023 | Autonomous DrivingVideo Generation | CodeCode Available | 2 | 5 |
| Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions | Feb 24, 2025 | Data AugmentationImage Generation | CodeCode Available | 2 | 5 |
| DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation | Mar 27, 2025 | DenoisingHuman Animation | CodeCode Available | 2 | 5 |
| Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion | May 30, 2024 | Semantic CommunicationVideo Compression | CodeCode Available | 2 | 5 |
| AnimateZero: Video Diffusion Models are Zero-Shot Image Animators | Dec 6, 2023 | Image AnimationVideo Generation | CodeCode Available | 2 | 5 |
| Blind Video Deflickering by Neural Filtering with a Flawed Atlas | Mar 14, 2023 | Video GenerationVideo Temporal Consistency | CodeCode Available | 2 | 5 |
| DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving | Sep 18, 2023 | Autonomous DrivingVideo Generation | CodeCode Available | 2 | 5 |
| On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Feb 5, 2025 | DenoisingModel Optimization | CodeCode Available | 2 | 5 |
| DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model | Oct 11, 2023 | Autonomous DrivingImage Generation | CodeCode Available | 2 | 5 |
| FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation | Jun 10, 2025 | Image-text RetrievalQuestion Answering | CodeCode Available | 2 | 5 |
| On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Mar 31, 2025 | DenoisingModel Optimization | CodeCode Available | 2 | 5 |
| Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives | Nov 9, 2022 | DisentanglementVideo Generation | CodeCode Available | 2 | 5 |
| Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers | Jan 7, 2025 | DiversityText-to-Video Generation | CodeCode Available | 2 | 5 |
| DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory | Aug 16, 2023 | Trajectory ModelingVideo Generation | CodeCode Available | 2 | 5 |
| MAGVIT: Masked Generative Video Transformer | Dec 10, 2022 | Multi-Task LearningText-to-Video Generation | CodeCode Available | 2 | 5 |
| Depth-Aware Generative Adversarial Network for Talking Head Video Generation | Mar 13, 2022 | 3D geometryGenerative Adversarial Network | CodeCode Available | 2 | 5 |
| DreamGaussian4D: Generative 4D Gaussian Splatting | Dec 28, 2023 | Video Generation | CodeCode Available | 2 | 5 |
| Doe-1: Closed-Loop Autonomous Driving with Large World Model | Dec 12, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 2 | 5 |
| DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark | May 30, 2024 | DeepFake DetectionMamba | CodeCode Available | 2 | 5 |
| DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance | May 17, 2025 | Video Generation | CodeCode Available | 2 | 5 |
| Omni-Video: Democratizing Unified Video Understanding and Generation | Jul 8, 2025 | Video GenerationVideo Understanding | CodeCode Available | 2 | 5 |
| Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality | Oct 7, 2024 | Video Generation | CodeCode Available | 1 | 5 |
| LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation | May 17, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 | 5 |
| Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions | Jan 27, 2022 | Motion EstimationVideo Frame Interpolation | CodeCode Available | 1 | 5 |
| NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion | Nov 24, 2021 | DecoderImage Generation | CodeCode Available | 1 | 5 |
| Detecting AI-Generated Video via Frame Consistency | Feb 3, 2024 | Video Generation | CodeCode Available | 1 | 5 |
| DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations | Jan 23, 2024 | 3D Shape GenerationImage Generation | CodeCode Available | 1 | 5 |