| VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs | Apr 12, 2023 | Image AnimationVideo Editing | —Unverified | 0 |
| Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models | May 7, 2024 | Video GenerationVideo Prediction | —Unverified | 0 |
| VIMI: Grounding Video Generation through Multi-modal Instruction | Jul 8, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| VISAGE: Video Synthesis using Action Graphs for Surgery | Oct 23, 2024 | Video Generation | —Unverified | 0 |
| Visual Representation Learning with Stochastic Frame Prediction | Jun 11, 2024 | DecoderPose Tracking | —Unverified | 0 |
| VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers | May 28, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| Vivid-ZOO: Multi-View Video Generation with Diffusion Model | Jun 12, 2024 | Video Generation | —Unverified | 0 |
| VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis | Mar 13, 2024 | Face DetectionVideo Editing | —Unverified | 0 |
| Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos | Mar 24, 2018 | Motion EstimationPrediction | —Unverified | 0 |
| VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers | Aug 30, 2024 | GPUImage Generation | —Unverified | 0 |
| We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback | Apr 24, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making | Nov 8, 2024 | Decision MakingVideo Generation | —Unverified | 0 |
| What Matters in Detecting AI-Generated Videos like Sora? | Jun 27, 2024 | Optical Flow EstimationVideo Generation | —Unverified | 0 |
| What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality | Nov 20, 2024 | Video Generation | —Unverified | 0 |
| When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding | Aug 15, 2024 | Video CompressionVideo Generation | —Unverified | 0 |
| WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation | Mar 11, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions | May 23, 2025 | SandScene Generation | —Unverified | 0 |
| World-consistent Video Diffusion with Explicit 3D Modeling | Dec 2, 2024 | 3D GenerationImage Generation | —Unverified | 0 |
| WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens | Jan 18, 2024 | Video EditingVideo Generation | —Unverified | 0 |
| WorldEval: World Model as Real-World Robot Policies Evaluator | May 25, 2025 | Robot ManipulationVideo Generation | —Unverified | 0 |
| WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs | Mar 10, 2024 | AI AgentVideo Generation | —Unverified | 0 |
| World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving | Jul 17, 2025 | Accident AnticipationAutonomous Driving | —Unverified | 0 |
| WorldPrompter: Traversable Text-to-Scene Generation | Apr 2, 2025 | 3D GenerationScene Generation | —Unverified | 0 |
| WorldScore: A Unified Evaluation Benchmark for World Generation | Apr 1, 2025 | Scene GenerationVideo Generation | —Unverified | 0 |
| WorldSimBench: Towards Video Generation Models as World Simulators | Oct 23, 2024 | Autonomous DrivingRobot Manipulation | —Unverified | 0 |
| X-Dancer: Expressive Music to Human Dance Video Generation | Feb 24, 2025 | Image AnimationVideo Generation | —Unverified | 0 |
| xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations | Aug 22, 2024 | Dense CaptioningMotion Estimation | —Unverified | 0 |
| Xp-GAN: Unsupervised Multi-object Controllable Video Generation | Nov 19, 2021 | ObjectVideo Generation | —Unverified | 0 |
| Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model | Mar 28, 2025 | Video Generation | —Unverified | 0 |
| ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation | Dec 24, 2024 | Human-Object Interaction DetectionVideo Generation | —Unverified | 0 |
| Generating Videos of Zero-Shot Compositions of Actions and Objects | Dec 5, 2019 | Human-Object Interaction DetectionObject | —Unverified | 0 |
| Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors | Mar 25, 2025 | DiversityHuman-Object Interaction Detection | —Unverified | 0 |
| Zero-Shot Video Editing through Adaptive Sliding Score Distillation | Jun 7, 2024 | DenoisingText-to-Video Generation | —Unverified | 0 |
| 0/1 Deep Neural Networks via Block Coordinate Descent | Jun 19, 2022 | 10-shot image generation | —Unverified | 0 |
| Towards A Better Metric for Text-to-Video Generation | Jan 15, 2024 | Mixture-of-ExpertsText-to-Video Generation | —Unverified | 0 |
| Towards Chunk-Wise Generation for Long Videos | Nov 27, 2024 | DenoisingGPU | —Unverified | 0 |
| Towards Generative Latent Variable Models for Speech | Sep 29, 2021 | Image GenerationVideo Generation | —Unverified | 0 |
| Towards motion from video diffusion models | Nov 19, 2024 | Video Generation | —Unverified | 0 |
| Towards Multi-Task Multi-Modal Models: A Video Generative Perspective | May 26, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Towards Physically Plausible Video Generation via VLM Planning | Mar 30, 2025 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach | Feb 5, 2025 | Video Generation | —Unverified | 0 |
| Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation | Dec 8, 2024 | Point TrackingVideo Generation | —Unverified | 0 |
| TrackGo: A Flexible and Efficient Method for Controllable Video Generation | Aug 21, 2024 | Video Generation | —Unverified | 0 |
| Training-free Camera Control for Video Generation | Jun 14, 2024 | Data AugmentationVideo Generation | —Unverified | 0 |
| Training-free Diffusion Acceleration with Bottleneck Sampling | Mar 24, 2025 | DenoisingImage Generation | —Unverified | 0 |
| Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization | Apr 11, 2025 | DenoisingObject | —Unverified | 0 |
| Decoupled Video Generation with Chain of Training-free Diffusion Model Experts | Aug 24, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss | Jan 13, 2025 | Feature CorrelationVideo Generation | —Unverified | 0 |
| Trajectory Attention for Fine-grained Video Motion Control | Nov 28, 2024 | Inductive BiasVideo Editing | —Unverified | 0 |
| TransAnimate: Taming Layer Diffusion to Generate RGBA Video | Mar 23, 2025 | Image GenerationVideo Generation | —Unverified | 0 |