| How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models | Apr 3, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| OmniCam: Unified Multimodal Video Generation via Camera Control | Apr 3, 2025 | Video Generation | —Unverified | 0 |
| WorldPrompter: Traversable Text-to-Scene Generation | Apr 2, 2025 | 3D GenerationScene Generation | —Unverified | 0 |
| WorldScore: A Unified Evaluation Benchmark for World Generation | Apr 1, 2025 | Scene GenerationVideo Generation | —Unverified | 0 |
| JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation | Mar 31, 2025 | Video Generation | —Unverified | 0 |
| HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation | Mar 31, 2025 | Video Generation | —Unverified | 0 |
| Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation | Mar 31, 2025 | Video Generation | —Unverified | 0 |
| HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation | Mar 31, 2025 | HallucinationHuman-Object Interaction Detection | —Unverified | 0 |
| MoCha: Towards Movie-Grade Talking Character Synthesis | Mar 30, 2025 | Video Generation | —Unverified | 0 |
| Towards Physically Plausible Video Generation via VLM Planning | Mar 30, 2025 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization | Mar 30, 2025 | Video Generation | —Unverified | 0 |
| SketchVideo: Sketch-based Video Generation and Editing | Mar 30, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model | Mar 28, 2025 | Video Generation | —Unverified | 0 |
| EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation | Mar 28, 2025 | Medical Image AnalysisPrivacy Preserving | —Unverified | 0 |
| CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving | Mar 28, 2025 | 3D GenerationAutonomous Driving | —Unverified | 0 |
| Audio-driven Gesture Generation via Deviation Feature in the Latent Space | Mar 27, 2025 | Gesture GenerationVideo Generation | —Unverified | 0 |
| VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models | Mar 27, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model | Mar 27, 2025 | GPUVideo Generation | —Unverified | 0 |
| Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models | Mar 26, 2025 | Video Generation | —Unverified | 0 |
| GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving | Mar 26, 2025 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations | Mar 26, 2025 | DescriptiveText-to-Video Generation | CodeCode Available | 0 |
| Video Motion Graphs | Mar 26, 2025 | Motion InterpolationVideo Frame Interpolation | —Unverified | 0 |
| AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports | Mar 26, 2025 | Autonomous DrivingNeRF | —Unverified | 0 |
| RecTable: Fast Modeling Tabular Data with Rectified Flow | Mar 26, 2025 | Image GenerationText to Image Generation | CodeCode Available | 0 |
| Synthetic Video Enhances Physical Fidelity in Video Synthesis | Mar 26, 2025 | Video Generation | —Unverified | 0 |
| FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling | Mar 25, 2025 | Deep LearningVideo Generation | —Unverified | 0 |
| FullDiT: Multi-Task Video Generative Foundation Model with Full Attention | Mar 25, 2025 | Video Generation | —Unverified | 0 |
| AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers | Mar 25, 2025 | Video Generation | —Unverified | 0 |
| Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation | Mar 25, 2025 | text annotationVideo Generation | —Unverified | 0 |
| Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals | Mar 25, 2025 | counterfactualMotion Estimation | —Unverified | 0 |
| Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors | Mar 25, 2025 | DiversityHuman-Object Interaction Detection | —Unverified | 0 |
| Training-free Diffusion Acceleration with Bottleneck Sampling | Mar 24, 2025 | DenoisingImage Generation | —Unverified | 0 |
| Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation | Mar 24, 2025 | Motion GenerationPortrait Animation | —Unverified | 0 |
| EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation | Mar 24, 2025 | BenchmarkingData Augmentation | —Unverified | 0 |
| Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance | Mar 24, 2025 | Text-to-Video GenerationVideo Editing | —Unverified | 0 |
| Aether: Geometric-Aware Unified World Modeling | Mar 24, 2025 | Dynamic ReconstructionPrediction | —Unverified | 0 |
| Can Text-to-Video Generation help Video-Language Alignment? | Mar 24, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Video-T1: Test-Time Scaling for Video Generation | Mar 24, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| TransAnimate: Taming Layer Diffusion to Generate RGBA Video | Mar 23, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| LongDiff: Training-Free Long Video Generation in One Go | Mar 23, 2025 | PositionVideo Generation | —Unverified | 0 |
| RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation | Mar 22, 2025 | Video Generation | —Unverified | 0 |
| Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks | Mar 21, 2025 | DenoisingOptical Flow Estimation | —Unverified | 0 |
| Enabling Versatile Controls for Video Diffusion Models | Mar 21, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Position: Interactive Generative Video as Next-Generation Game Engine | Mar 21, 2025 | PositionVideo Generation | —Unverified | 0 |
| Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model | Mar 21, 2025 | DisentanglementHuman-Object Interaction Detection | —Unverified | 0 |
| ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos | Mar 20, 2025 | DenoisingDiversity | —Unverified | 0 |
| MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance | Mar 20, 2025 | Image to Video GenerationObject | —Unverified | 0 |
| PoseTraj: Pose-Aware Trajectory Control in Video Diffusion | Mar 20, 2025 | DisentanglementVideo Generation | —Unverified | 0 |
| VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling | Mar 20, 2025 | 3DGSText to 3D | —Unverified | 0 |
| Temporal Regularization Makes Your Video Generator Stronger | Mar 19, 2025 | DiversityVideo Generation | —Unverified | 0 |