| Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | Jun 10, 2025 | 3D Lane Detection3D Object Detection | CodeCode Available | 3 |
| MagCache: Fast Video Generation with Magnitude-Aware Cache | Jun 10, 2025 | SSIMVideo Generation | CodeCode Available | 3 |
| Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion | Jun 9, 2025 | GPUVideo Generation | —Unverified | 0 |
| PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement | Jun 9, 2025 | Video Generation | —Unverified | 0 |
| Audio-Sync Video Generation with Multi-Stream Temporal Control | Jun 9, 2025 | Audio-Visual SynchronizationVideo Alignment | —Unverified | 0 |
| Generative Modeling of Weights: Generalization or Memorization? | Jun 9, 2025 | MemorizationVideo Generation | CodeCode Available | 1 |
| Seeing Voices: Generating A-Roll Video from Audio with Mirage | Jun 9, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models | Jun 8, 2025 | ARCFew-Shot Learning | —Unverified | 0 |
| Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers | Jun 5, 2025 | GPUText-to-Video Generation | —Unverified | 0 |
| ContentV: Efficient Training of Video Generation Models with Limited Compute | Jun 5, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation | Jun 5, 2025 | DenoisingVideo Generation | CodeCode Available | 1 |
| DualX-VSR: Dual Axial SpatialTemporal Transformer for Real-World Video Super-Resolution without Motion Compensation | Jun 5, 2025 | Motion CompensationOptical Flow Estimation | —Unverified | 0 |
| FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion | Jun 5, 2025 | DenoisingQuantization | —Unverified | 0 |
| Follow-Your-Creation: Empowering 4D Creation through Video Inpainting | Jun 5, 2025 | Video GenerationVideo Inpainting | —Unverified | 0 |
| FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers | Jun 4, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| ORV: 4D Occupancy-centric Robot Video Generation | Jun 3, 2025 | Video Generation | CodeCode Available | 2 |
| IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation | Jun 3, 2025 | 3D geometryVideo Generation | —Unverified | 0 |
| TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models | Jun 3, 2025 | DecoderKnowledge Distillation | —Unverified | 0 |
| SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios | Jun 3, 2025 | Motion GenerationVideo Generation | CodeCode Available | 1 |
| OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation | Jun 2, 2025 | Data AugmentationHuman Animation | CodeCode Available | 5 |
| LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model | Jun 2, 2025 | Video Generation | —Unverified | 0 |
| DeepVerse: 4D Autoregressive Video Generation as a World Model | Jun 1, 2025 | Video Generation | —Unverified | 0 |
| Evaluating Robot Policies in a World Model | May 31, 2025 | modelVideo Generation | —Unverified | 0 |
| Video Signature: In-generation Watermarking for Latent Video Diffusion Models | May 31, 2025 | DecoderVideo Generation | —Unverified | 0 |
| Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes | May 30, 2025 | counterfactualVideo Generation | —Unverified | 0 |
| DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds | May 30, 2025 | Image InpaintingVideo Generation | —Unverified | 0 |
| MiniMax-Remover: Taming Bad Noise Helps Video Object Removal | May 30, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation | May 30, 2025 | Video Generation | —Unverified | 0 |
| Interactive Video Generation via Domain Adaptation | May 30, 2025 | AttributeDenoising | —Unverified | 0 |
| STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models | May 30, 2025 | Video Generation | CodeCode Available | 1 |
| VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos | May 29, 2025 | Question AnsweringVideo Generation | CodeCode Available | 0 |
| MAGREF: Masked Guidance for Any-Reference Video Generation | May 29, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 3 |
| VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation | May 29, 2025 | Caption GenerationLanguage Modeling | CodeCode Available | 1 |
| VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models | May 29, 2025 | Self-Supervised LearningVideo Generation | CodeCode Available | 2 |
| GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion | May 29, 2025 | Depth EstimationImage to Video Generation | —Unverified | 0 |
| MOVi: Training-free Text-conditioned Multi-Object Video Generation | May 29, 2025 | ObjectVideo Generation | —Unverified | 0 |
| RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer | May 29, 2025 | Imitation LearningVideo Generation | —Unverified | 0 |
| MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation | May 29, 2025 | Motion GenerationVideo Generation | CodeCode Available | 1 |
| HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions | May 29, 2025 | Image AnimationVideo Generation | CodeCode Available | 2 |
| PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms | May 28, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| Learning World Models for Interactive Video Generation | May 28, 2025 | In-Context LearningRetrieval | —Unverified | 0 |
| Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation | May 28, 2025 | Human AnimationInstruction Following | CodeCode Available | 7 |
| ATI: Any Trajectory Instruction for Controllable Video Generation | May 28, 2025 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation | May 27, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| Minute-Long Videos with Dual Parallelisms | May 27, 2025 | DenoisingGPU | CodeCode Available | 1 |
| SageAttention2++: A More Efficient Implementation of SageAttention2 | May 27, 2025 | QuantizationVideo Generation | CodeCode Available | 7 |
| The Role of Video Generation in Enhancing Data-Limited Action Understanding | May 26, 2025 | Action RecognitionAction Understanding | —Unverified | 0 |
| Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals | May 26, 2025 | DiversityVideo Generation | —Unverified | 0 |
| DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving | May 26, 2025 | Autonomous DrivingVideo Generation | CodeCode Available | 1 |
| MotionPro: A Precise Motion Controller for Image-to-Video Generation | May 26, 2025 | DenoisingImage to Video Generation | —Unverified | 0 |