| VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention | Mar 19, 2025 | Video Generation | —Unverified | 0 | 0 |
| VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing | Feb 24, 2025 | Video EditingVideo Generation | —Unverified | 0 | 0 |
| Video-Infinity: Distributed Long Video Generation | Jun 24, 2024 | GPUVideo Generation | —Unverified | 0 | 0 |
| Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation | Dec 24, 2024 | Video Generation | —Unverified | 0 | 0 |
| VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models | Feb 4, 2025 | Motion Generationmotion prediction | —Unverified | 0 | 0 |
| Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation | Feb 1, 2025 | Image GenerationVideo Generation | —Unverified | 0 | 0 |
| VideoLCM: Video Latent Consistency Model | Dec 14, 2023 | Computational EfficiencyImage Generation | —Unverified | 0 | 0 |
| VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models | Mar 27, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 | 0 |
| VideoMAR: Autoregressive Video Generatio with Continuous Tokens | Jun 17, 2025 | GPUImage Generation | —Unverified | 0 | 0 |
| VideoMerge: Towards Training-free Long Video Generation | Mar 13, 2025 | DenoisingVideo Generation | —Unverified | 0 | 0 |
| Video Motion Graphs | Mar 26, 2025 | Motion InterpolationVideo Frame Interpolation | —Unverified | 0 | 0 |
| VideoPanda: Video Panoramic Diffusion with Multi-view Attention | Apr 15, 2025 | Video Generation | —Unverified | 0 | 0 |
| Video Perception Models for 3D Scene Synthesis | Jun 25, 2025 | 3D ReconstructionImage Generation | —Unverified | 0 | 0 |
| VideoPhy: Evaluating Physical Commonsense for Video Generation | Jun 5, 2024 | Video Generation | —Unverified | 0 | 0 |
| VideoPoet: A Large Language Model for Zero-Shot Video Generation | Dec 21, 2023 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Nov 22, 2024 | Text-to-Video GenerationVideo Alignment | —Unverified | 0 | 0 |
| VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling | Mar 20, 2025 | 3DGSText to 3D | —Unverified | 0 | 0 |
| Video Signature: In-generation Watermarking for Latent Video Diffusion Models | May 31, 2025 | DecoderVideo Generation | —Unverified | 0 | 0 |
| Rethinking Video Super-Resolution: Towards Diffusion-Based Methods without Motion Alignment | Mar 5, 2025 | AllSuper-Resolution | —Unverified | 0 | 0 |
| Video-T1: Test-Time Scaling for Video Generation | Mar 24, 2025 | DenoisingVideo Generation | —Unverified | 0 | 0 |
| Video-to-Audio Generation with Fine-grained Temporal Semantics | Sep 23, 2024 | Audio GenerationVideo Generation | —Unverified | 0 | 0 |
| Video-to-Audio Generation with Hidden Alignment | Jul 10, 2024 | Audio GenerationData Augmentation | —Unverified | 0 | 0 |
| Video Virtual Try-on with Conditional Diffusion Transformer Inpainter | Jun 26, 2025 | Video GenerationVideo Inpainting | —Unverified | 0 | 0 |
| VideoWorld: Exploring Knowledge Learning from Unlabeled Videos | Jan 16, 2025 | Video Generation | —Unverified | 0 | 0 |
| VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models | Nov 30, 2023 | Semantic SegmentationVideo Editing | —Unverified | 0 | 0 |
| VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation | Nov 14, 2024 | DenoisingRobot Manipulation | —Unverified | 0 | 0 |
| VidPanos: Generative Panoramic Videos from Casual Panning Videos | Oct 17, 2024 | Image StitchingVideo Generation | —Unverified | 0 | 0 |
| VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs | Apr 12, 2023 | Image AnimationVideo Editing | —Unverified | 0 | 0 |
| Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models | May 7, 2024 | Video GenerationVideo Prediction | —Unverified | 0 | 0 |
| VIMI: Grounding Video Generation through Multi-modal Instruction | Jul 8, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 | 0 |
| VISAGE: Video Synthesis using Action Graphs for Surgery | Oct 23, 2024 | Video Generation | —Unverified | 0 | 0 |
| Visual Representation Learning with Stochastic Frame Prediction | Jun 11, 2024 | DecoderPose Tracking | —Unverified | 0 | 0 |
| VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers | May 28, 2024 | DenoisingVideo Generation | —Unverified | 0 | 0 |
| Vivid-ZOO: Multi-View Video Generation with Diffusion Model | Jun 12, 2024 | Video Generation | —Unverified | 0 | 0 |
| VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis | Mar 13, 2024 | Face DetectionVideo Editing | —Unverified | 0 | 0 |
| Adversarial Framework for Unsupervised Learning of Motion Dynamics in Videos | Mar 24, 2018 | Motion EstimationPrediction | —Unverified | 0 | 0 |
| VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers | Aug 30, 2024 | GPUImage Generation | —Unverified | 0 | 0 |
| We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback | Apr 24, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 | 0 |
| WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making | Nov 8, 2024 | Decision MakingVideo Generation | —Unverified | 0 | 0 |
| What Matters in Detecting AI-Generated Videos like Sora? | Jun 27, 2024 | Optical Flow EstimationVideo Generation | —Unverified | 0 | 0 |
| What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality | Nov 20, 2024 | Video Generation | —Unverified | 0 | 0 |
| When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding | Aug 15, 2024 | Video CompressionVideo Generation | —Unverified | 0 | 0 |
| WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation | Mar 11, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 | 0 |
| WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions | May 23, 2025 | SandScene Generation | —Unverified | 0 | 0 |
| World-consistent Video Diffusion with Explicit 3D Modeling | Dec 2, 2024 | 3D GenerationImage Generation | —Unverified | 0 | 0 |
| WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens | Jan 18, 2024 | Video EditingVideo Generation | —Unverified | 0 | 0 |
| WorldEval: World Model as Real-World Robot Policies Evaluator | May 25, 2025 | Robot ManipulationVideo Generation | —Unverified | 0 | 0 |
| WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs | Mar 10, 2024 | AI AgentVideo Generation | —Unverified | 0 | 0 |
| World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving | Jul 17, 2025 | Accident AnticipationAutonomous Driving | —Unverified | 0 | 0 |
| WorldPrompter: Traversable Text-to-Scene Generation | Apr 2, 2025 | 3D GenerationScene Generation | —Unverified | 0 | 0 |