| Unified Dense Prediction of Video Diffusion | Mar 12, 2025 | PredictionVideo Generation | —Unverified | 0 |
| Reangle-A-Video: 4D Video Generation as Video-to-Video Translation | Mar 12, 2025 | TranslationVideo Generation | —Unverified | 0 |
| WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation | Mar 11, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion | Mar 11, 2025 | Image MattingVideo Alignment | CodeCode Available | 1 |
| ^RFLAV: Rolling Flow matching for infinite Audio Video generation | Mar 11, 2025 | Video Generation | CodeCode Available | 1 |
| ObjectMover: Generative Object Movement with Video Prior | Mar 11, 2025 | Multi-Task LearningObject | —Unverified | 0 |
| Automated Movie Generation via Multi-Agent CoT Planning | Mar 10, 2025 | Video Generation | CodeCode Available | 3 |
| DreamRelation: Relation-Centric Video Customization | Mar 10, 2025 | RelationTriplet | —Unverified | 0 |
| AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion | Mar 10, 2025 | Video Generation | CodeCode Available | 2 |
| VACE: All-in-One Video Creation and Editing | Mar 10, 2025 | AllHuman-Domain Subject-to-Video | CodeCode Available | 7 |
| VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation | Mar 9, 2025 | Video Generation | CodeCode Available | 0 |
| A Light and Tuning-free Method for Simulating Camera Motion in Video Generation | Mar 9, 2025 | DenoisingDepth Estimation | CodeCode Available | 1 |
| TR-DQ: Time-Rotation Diffusion Quantization | Mar 9, 2025 | Image GenerationQuantization | —Unverified | 0 |
| Generative Video Bi-flow | Mar 9, 2025 | Unconditional Video GenerationVideo Generation | CodeCode Available | 0 |
| QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation | Mar 9, 2025 | QuantizationVideo Generation | CodeCode Available | 1 |
| Text2Story: Advancing Video Storytelling with Text Guidance | Mar 8, 2025 | FormImage Generation | —Unverified | 0 |
| GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation | Mar 8, 2025 | 3D GenerationDecoder | —Unverified | 0 |
| DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation | Mar 8, 2025 | Video Generation | CodeCode Available | 1 |
| VACT: A Video Automatic Causal Testing System and a Benchmark | Mar 8, 2025 | Large Language ModelVideo Generation | —Unverified | 0 |
| Object-Centric World Model for Language-Guided Manipulation | Mar 8, 2025 | Autonomous Drivingmodel | —Unverified | 0 |
| MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio | Mar 7, 2025 | Video Generation | CodeCode Available | 3 |
| Unified Reward Model for Multimodal Understanding and Generation | Mar 7, 2025 | Image Generationmodel | CodeCode Available | 4 |
| MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice | Mar 7, 2025 | DenoisingPortrait Animation | —Unverified | 0 |
| FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video | Mar 6, 2025 | Future predictionNovel View Synthesis | —Unverified | 0 |
| The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation | Mar 6, 2025 | Semantic CompressionVideo Generation | CodeCode Available | 1 |
| Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation | Mar 6, 2025 | DecoderGPU | CodeCode Available | 1 |
| Rethinking Video Tokenization: A Conditioned Diffusion-based Approach | Mar 5, 2025 | DecoderVideo Compression | CodeCode Available | 1 |
| High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights | Mar 5, 2025 | Video Generation | CodeCode Available | 0 |
| Rethinking Video Super-Resolution: Towards Diffusion-Based Methods without Motion Alignment | Mar 5, 2025 | AllSuper-Resolution | —Unverified | 0 |
| GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control | Mar 5, 2025 | Novel View SynthesisVideo Generation | CodeCode Available | 5 |
| DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance | Mar 5, 2025 | 3D Object DetectionBEV Segmentation | CodeCode Available | 1 |
| VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation | Mar 3, 2025 | Text-to-Video GenerationVideo Generation | CodeCode Available | 0 |
| Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think | Mar 2, 2025 | DenoisingImage to Video Generation | CodeCode Available | 1 |
| Unified Video Action Model | Feb 28, 2025 | modelPrediction | —Unverified | 0 |
| HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models | Feb 28, 2025 | Action UnderstandingText-to-Video Generation | —Unverified | 0 |
| Mobius: Text to Seamless Looping Video Generation via Latent Shift | Feb 27, 2025 | DenoisingVideo Generation | CodeCode Available | 2 |
| C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation | Feb 27, 2025 | ObjectVideo Generation | CodeCode Available | 1 |
| FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute | Feb 27, 2025 | DenoisingImage Generation | —Unverified | 0 |
| Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis | Feb 26, 2025 | Video Generation | —Unverified | 0 |
| ASurvey: Spatiotemporal Consistency in Video Generation | Feb 25, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference | Feb 25, 2025 | modelVideo Generation | CodeCode Available | 4 |
| VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing | Feb 24, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| X-Dancer: Expressive Music to Human Dance Video Generation | Feb 24, 2025 | Image AnimationVideo Generation | —Unverified | 0 |
| Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions | Feb 24, 2025 | Data AugmentationImage Generation | CodeCode Available | 2 |
| RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers | Feb 21, 2025 | Video Generation | —Unverified | 0 |
| Hardware-Friendly Static Quantization Method for Video Diffusion Transformers | Feb 20, 2025 | QuantizationVideo Generation | —Unverified | 0 |
| RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers | Feb 20, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Improving the Diffusability of Autoencoders | Feb 20, 2025 | DecoderImage Generation | —Unverified | 0 |
| Designing Parameter and Compute Efficient Diffusion Transformers using Distillation | Feb 20, 2025 | Knowledge DistillationNVIDIA Jetson Orin Nano | —Unverified | 0 |
| VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation | Feb 18, 2025 | Text-to-Video GenerationVideo Captioning | CodeCode Available | 1 |