| Seeing Voices: Generating A-Roll Video from Audio with Mirage | Jun 9, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models | Jun 8, 2025 | ARCFew-Shot Learning | —Unverified | 0 |
| FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion | Jun 5, 2025 | DenoisingQuantization | —Unverified | 0 |
| Follow-Your-Creation: Empowering 4D Creation through Video Inpainting | Jun 5, 2025 | Video GenerationVideo Inpainting | —Unverified | 0 |
| ContentV: Efficient Training of Video Generation Models with Limited Compute | Jun 5, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers | Jun 5, 2025 | GPUText-to-Video Generation | —Unverified | 0 |
| DualX-VSR: Dual Axial SpatialTemporal Transformer for Real-World Video Super-Resolution without Motion Compensation | Jun 5, 2025 | Motion CompensationOptical Flow Estimation | —Unverified | 0 |
| FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers | Jun 4, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation | Jun 3, 2025 | 3D geometryVideo Generation | —Unverified | 0 |
| TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models | Jun 3, 2025 | DecoderKnowledge Distillation | —Unverified | 0 |
| LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model | Jun 2, 2025 | Video Generation | —Unverified | 0 |
| DeepVerse: 4D Autoregressive Video Generation as a World Model | Jun 1, 2025 | Video Generation | —Unverified | 0 |
| Video Signature: In-generation Watermarking for Latent Video Diffusion Models | May 31, 2025 | DecoderVideo Generation | —Unverified | 0 |
| Evaluating Robot Policies in a World Model | May 31, 2025 | modelVideo Generation | —Unverified | 0 |
| UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation | May 30, 2025 | Video Generation | —Unverified | 0 |
| Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes | May 30, 2025 | counterfactualVideo Generation | —Unverified | 0 |
| Interactive Video Generation via Domain Adaptation | May 30, 2025 | AttributeDenoising | —Unverified | 0 |
| DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds | May 30, 2025 | Image InpaintingVideo Generation | —Unverified | 0 |
| MiniMax-Remover: Taming Bad Noise Helps Video Object Removal | May 30, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos | May 29, 2025 | Question AnsweringVideo Generation | CodeCode Available | 0 |
| MOVi: Training-free Text-conditioned Multi-Object Video Generation | May 29, 2025 | ObjectVideo Generation | —Unverified | 0 |
| GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion | May 29, 2025 | Depth EstimationImage to Video Generation | —Unverified | 0 |
| RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer | May 29, 2025 | Imitation LearningVideo Generation | —Unverified | 0 |
| PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms | May 28, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| ATI: Any Trajectory Instruction for Controllable Video Generation | May 28, 2025 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| Learning World Models for Interactive Video Generation | May 28, 2025 | In-Context LearningRetrieval | —Unverified | 0 |
| Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation | May 27, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals | May 26, 2025 | DiversityVideo Generation | —Unverified | 0 |
| Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM | May 26, 2025 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| MotionPro: A Precise Motion Controller for Image-to-Video Generation | May 26, 2025 | DenoisingImage to Video Generation | —Unverified | 0 |
| The Role of Video Generation in Enhancing Data-Limited Action Understanding | May 26, 2025 | Action RecognitionAction Understanding | —Unverified | 0 |
| From Single Images to Motion Policies via Video-Generation Environment Representations | May 25, 2025 | Depth EstimationMonocular Depth Estimation | —Unverified | 0 |
| WorldEval: World Model as Real-World Robot Policies Evaluator | May 25, 2025 | Robot ManipulationVideo Generation | —Unverified | 0 |
| SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation | May 25, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation | May 24, 2025 | Semantic SimilaritySemantic Textual Similarity | —Unverified | 0 |
| ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos | May 24, 2025 | Action GenerationAutonomous Driving | —Unverified | 0 |
| InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO | May 23, 2025 | Text-to-Video GenerationVideo Generation | CodeCode Available | 0 |
| WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions | May 23, 2025 | SandScene Generation | —Unverified | 0 |
| Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts | May 22, 2025 | Dialogue GenerationLarge Language Model | —Unverified | 0 |
| MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM | May 22, 2025 | 3D GenerationVideo Generation | —Unverified | 0 |
| AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection | May 21, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| Interspatial Attention for Efficient 4D Human Video Generation | May 21, 2025 | Video Generation | —Unverified | 0 |
| Challenger: Affordable Adversarial Driving Video Generation | May 21, 2025 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| Generative AI for Autonomous Driving: A Review | May 21, 2025 | Autonomous DrivingImage Generation | —Unverified | 0 |
| Programmatic Video Prediction Using Large Language Models | May 20, 2025 | Autonomous DrivingPrediction | CodeCode Available | 0 |
| Hunyuan-Game: Industrial-grade Intelligent Game Creation Model | May 20, 2025 | Image GenerationImage to Video Generation | —Unverified | 0 |
| LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer | May 20, 2025 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking | May 19, 2025 | Image GenerationMamba | —Unverified | 0 |
| FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance | May 19, 2025 | Action GenerationHuman action generation | —Unverified | 0 |
| VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption | May 17, 2025 | DecoderPosition | —Unverified | 0 |