| GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation | Jun 19, 2024 | BenchmarkingImage Generation | CodeCode Available | 3 |
| OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation | Jun 13, 2024 | Video GenerationVideo Prediction | CodeCode Available | 3 |
| VideoTetris: Towards Compositional Text-to-Video Generation | Jun 6, 2024 | DenoisingText-to-Video Generation | CodeCode Available | 3 |
| MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion | May 30, 2024 | DenoisingGPU | CodeCode Available | 3 |
| T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback | May 29, 2024 | Video Generation | CodeCode Available | 3 |
| FIFO-Diffusion: Generating Infinite Videos from Text without Training | May 19, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| From Sora What We Can See: A Survey of Text-to-Video Generation | May 17, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| ID-Animator: Zero-Shot Identity-Preserving Human Video Generation | Apr 23, 2024 | AttributeVideo Generation | CodeCode Available | 3 |
| On the Content Bias in Fréchet Video Distance | Apr 18, 2024 | Video Generation | CodeCode Available | 3 |
| Evaluating Text-to-Visual Generation with Image-to-Text Generation | Apr 1, 2024 | Image to textQuestion Answering | CodeCode Available | 3 |
| CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility | Mar 18, 2024 | Image InpaintingVideo Alignment | CodeCode Available | 3 |
| DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation | Mar 11, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 3 |
| Magic-Me: Identity-Specific Video Customized Diffusion | Feb 14, 2024 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation | Feb 6, 2024 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| Lumiere: A Space-Time Diffusion Model for Video Generation | Jan 23, 2024 | Super-ResolutionText-to-Video Generation | CodeCode Available | 3 |
| MotionCtrl: A Unified and Flexible Motion Controller for Video Generation | Dec 6, 2023 | ObjectVideo Generation | CodeCode Available | 3 |
| VBench: Comprehensive Benchmark Suite for Video Generative Models | Nov 29, 2023 | Image GenerationVideo Generation | CodeCode Available | 3 |
| MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion | Nov 18, 2023 | Video Generation | CodeCode Available | 3 |
| Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation | Sep 27, 2023 | GPUText-to-Video Generation | CodeCode Available | 3 |
| FreeU: Free Lunch in Diffusion U-Net | Sep 20, 2023 | DecoderDenoising | CodeCode Available | 3 |
| Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos | Apr 3, 2023 | Image GenerationText to Image Generation | CodeCode Available | 3 |
| I^2-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting | Jul 12, 2025 | Autonomous DrivingComputational Efficiency | CodeCode Available | 2 |
| Omni-Video: Democratizing Unified Video Understanding and Generation | Jul 8, 2025 | Video GenerationVideo Understanding | CodeCode Available | 2 |
| PresentAgent: Multimodal Agent for Presentation Video Generation | Jul 5, 2025 | text-to-speechText to Speech | CodeCode Available | 2 |
| Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation | Jul 3, 2025 | DiversityVideo Generation | CodeCode Available | 2 |
| FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation | Jun 10, 2025 | Image-text RetrievalQuestion Answering | CodeCode Available | 2 |
| ORV: 4D Occupancy-centric Robot Video Generation | Jun 3, 2025 | Video Generation | CodeCode Available | 2 |
| HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions | May 29, 2025 | Image AnimationVideo Generation | CodeCode Available | 2 |
| VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models | May 29, 2025 | Self-Supervised LearningVideo Generation | CodeCode Available | 2 |
| Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers | May 20, 2025 | GPUVideo Generation | CodeCode Available | 2 |
| DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance | May 17, 2025 | Video Generation | CodeCode Available | 2 |
| HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation | Apr 30, 2025 | Depth EstimationScene Generation | CodeCode Available | 2 |
| SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation | Apr 19, 2025 | ERPVideo Generation | CodeCode Available | 2 |
| RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements | Apr 11, 2025 | Video Generation | CodeCode Available | 2 |
| On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Mar 31, 2025 | DenoisingModel Optimization | CodeCode Available | 2 |
| DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation | Mar 27, 2025 | DenoisingHuman Animation | CodeCode Available | 2 |
| Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer | Mar 21, 2025 | BenchmarkingVideo Generation | CodeCode Available | 2 |
| LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models | Mar 18, 2025 | compressed sensingVideo Generation | CodeCode Available | 2 |
| Concat-ID: Towards Universal Identity-Preserving Video Synthesis | Mar 18, 2025 | Human-Domain Subject-to-VideoVideo Generation | CodeCode Available | 2 |
| SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering | Mar 15, 2025 | Scene GenerationVideo Generation | CodeCode Available | 2 |
| VMBench: A Benchmark for Perception-Aligned Video Motion Generation | Mar 13, 2025 | Motion GenerationVideo Generation | CodeCode Available | 2 |
| Neighboring Autoregressive Modeling for Efficient Visual Generation | Mar 12, 2025 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop | Mar 12, 2025 | DiagnosticVideo Generation | CodeCode Available | 2 |
| AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion | Mar 10, 2025 | Video Generation | CodeCode Available | 2 |
| Mobius: Text to Seamless Looping Video Generation via Latent Shift | Feb 27, 2025 | DenoisingVideo Generation | CodeCode Available | 2 |
| Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions | Feb 24, 2025 | Data AugmentationImage Generation | CodeCode Available | 2 |
| On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Feb 5, 2025 | DenoisingModel Optimization | CodeCode Available | 2 |
| VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking | Jan 24, 2025 | DenoisingImage Generation | CodeCode Available | 2 |
| Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers | Jan 7, 2025 | DiversityText-to-Video Generation | CodeCode Available | 2 |
| Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model | Dec 30, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |