| How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models | Apr 3, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| OmniCam: Unified Multimodal Video Generation via Camera Control | Apr 3, 2025 | Video Generation | —Unverified | 0 |
| Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Apr 3, 2025 | MambaTalking Head Generation | CodeCode Available | 3 |
| SkyReels-A2: Compose Anything in Video Diffusion Transformers | Apr 3, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 4 |
| ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer | Apr 3, 2025 | DisentanglementMotion Disentanglement | CodeCode Available | 0 |
| Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model | Apr 3, 2025 | Scene GenerationVideo Generation | —Unverified | 0 |
| MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition | Apr 3, 2025 | Code GenerationImage to Video Generation | —Unverified | 0 |
| WorldPrompter: Traversable Text-to-Scene Generation | Apr 2, 2025 | 3D GenerationScene Generation | —Unverified | 0 |
| WorldScore: A Unified Evaluation Benchmark for World Generation | Apr 1, 2025 | Scene GenerationVideo Generation | —Unverified | 0 |
| On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Mar 31, 2025 | DenoisingModel Optimization | CodeCode Available | 2 |
| HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation | Mar 31, 2025 | HallucinationHuman-Object Interaction Detection | —Unverified | 0 |
| JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation | Mar 31, 2025 | Video Generation | —Unverified | 0 |
| Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation | Mar 31, 2025 | Video Generation | —Unverified | 0 |
| HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation | Mar 31, 2025 | Video Generation | —Unverified | 0 |
| MoCha: Towards Movie-Grade Talking Character Synthesis | Mar 30, 2025 | Video Generation | —Unverified | 0 |
| SketchVideo: Sketch-based Video Generation and Editing | Mar 30, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| VideoGen-Eval: Agent-based System for Video Generation Evaluation | Mar 30, 2025 | DiversityVideo Generation | CodeCode Available | 3 |
| Towards Physically Plausible Video Generation via VLM Planning | Mar 30, 2025 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization | Mar 30, 2025 | Video Generation | —Unverified | 0 |
| EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation | Mar 28, 2025 | Medical Image AnalysisPrivacy Preserving | —Unverified | 0 |
| CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving | Mar 28, 2025 | 3D GenerationAutonomous Driving | —Unverified | 0 |
| Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model | Mar 28, 2025 | Video Generation | —Unverified | 0 |
| VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness | Mar 27, 2025 | Anomaly DetectionVideo Generation | CodeCode Available | 5 |
| VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models | Mar 27, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Audio-driven Gesture Generation via Deviation Feature in the Latent Space | Mar 27, 2025 | Gesture GenerationVideo Generation | —Unverified | 0 |
| DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation | Mar 27, 2025 | DenoisingHuman Animation | CodeCode Available | 2 |
| ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model | Mar 27, 2025 | GPUVideo Generation | —Unverified | 0 |
| Exploring the Evolution of Physics Cognition in Video Generation: A Survey | Mar 27, 2025 | Video Generation | CodeCode Available | 3 |
| Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations | Mar 26, 2025 | DescriptiveText-to-Video Generation | CodeCode Available | 0 |
| Wan: Open and Advanced Large-Scale Video Generative Models | Mar 26, 2025 | Video EditingVideo Generation | CodeCode Available | 11 |
| AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports | Mar 26, 2025 | Autonomous DrivingNeRF | —Unverified | 0 |
| Synthetic Video Enhances Physical Fidelity in Video Synthesis | Mar 26, 2025 | Video Generation | —Unverified | 0 |
| RecTable: Fast Modeling Tabular Data with Rectified Flow | Mar 26, 2025 | Image GenerationText to Image Generation | CodeCode Available | 0 |
| Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models | Mar 26, 2025 | Video Generation | —Unverified | 0 |
| VPO: Aligning Text-to-Video Generation Models with Prompt Optimization | Mar 26, 2025 | In-Context LearningSafety Alignment | CodeCode Available | 1 |
| Video Motion Graphs | Mar 26, 2025 | Motion InterpolationVideo Frame Interpolation | —Unverified | 0 |
| GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving | Mar 26, 2025 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation | Mar 25, 2025 | text annotationVideo Generation | —Unverified | 0 |
| EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models | Mar 25, 2025 | Video Generation | CodeCode Available | 1 |
| Long-Context Autoregressive Video Modeling with Next-Frame Prediction | Mar 25, 2025 | Text GenerationVideo Generation | CodeCode Available | 3 |
| FullDiT: Multi-Task Video Generative Foundation Model with Full Attention | Mar 25, 2025 | Video Generation | —Unverified | 0 |
| Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals | Mar 25, 2025 | counterfactualMotion Estimation | —Unverified | 0 |
| AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers | Mar 25, 2025 | Video Generation | —Unverified | 0 |
| Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors | Mar 25, 2025 | DiversityHuman-Object Interaction Detection | —Unverified | 0 |
| FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling | Mar 25, 2025 | Deep LearningVideo Generation | —Unverified | 0 |
| AMD-Hummingbird: Towards an Efficient Text-to-Video Model | Mar 24, 2025 | Computational EfficiencyVideo Generation | CodeCode Available | 1 |
| Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation | Mar 24, 2025 | Motion GenerationPortrait Animation | —Unverified | 0 |
| Training-free Diffusion Acceleration with Bottleneck Sampling | Mar 24, 2025 | DenoisingImage Generation | —Unverified | 0 |
| Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance | Mar 24, 2025 | Text-to-Video GenerationVideo Editing | —Unverified | 0 |
| SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction | Mar 24, 2025 | Video GenerationVideo Prediction | CodeCode Available | 1 |