| Temporal Regularization Makes Your Video Generator Stronger | Mar 19, 2025 | DiversityVideo Generation | —Unverified | 0 |
| Fast Autoregressive Video Generation with Diagonal Decoding | Mar 18, 2025 | Video Generation | —Unverified | 0 |
| MusicInfuser: Making Video Diffusion Listen and Dance | Mar 18, 2025 | Video Generation | —Unverified | 0 |
| MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation | Mar 18, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| Impossible Videos | Mar 18, 2025 | counterfactualVideo Generation | —Unverified | 0 |
| Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction | Mar 17, 2025 | Video GenerationVideo Prediction | CodeCode Available | 0 |
| AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations | Mar 17, 2025 | Semantic SegmentationVideo Generation | —Unverified | 0 |
| EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis | Mar 16, 2025 | Accident AnticipationVideo Generation | —Unverified | 0 |
| SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs | Mar 16, 2025 | Semantic SegmentationVideo Generation | —Unverified | 0 |
| Cross-Modal Learning for Music-to-Music-Video Description Generation | Mar 14, 2025 | Video DescriptionVideo Generation | —Unverified | 0 |
| TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation | Mar 14, 2025 | Imitation LearningObject | —Unverified | 0 |
| HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models | Mar 14, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| ReCamMaster: Camera-Controlled Generative Rendering from A Single Video | Mar 14, 2025 | Super-ResolutionVideo Generation | —Unverified | 0 |
| Long Context Tuning for Video Generation | Mar 13, 2025 | Video Generation | —Unverified | 0 |
| Semantic Latent Motion for Portrait Video Generation | Mar 13, 2025 | DescriptiveVideo Generation | —Unverified | 0 |
| VideoMerge: Towards Training-free Long Video Generation | Mar 13, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance | Mar 13, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space | Mar 12, 2025 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| On the Limitations of Vision-Language Models in Understanding Image Transforms | Mar 12, 2025 | Question AnsweringVideo Generation | —Unverified | 0 |
| Unified Dense Prediction of Video Diffusion | Mar 12, 2025 | PredictionVideo Generation | —Unverified | 0 |
| Accelerating Diffusion Sampling via Exploiting Local Transition Coherence | Mar 12, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| Reangle-A-Video: 4D Video Generation as Video-to-Video Translation | Mar 12, 2025 | TranslationVideo Generation | —Unverified | 0 |
| I2V3D: Controllable image-to-video generation with 3D guidance | Mar 12, 2025 | 3D geometryImage to Video Generation | —Unverified | 0 |
| LuciBot: Automated Robot Policy Learning from Generated Videos | Mar 12, 2025 | Video Generation | —Unverified | 0 |
| ObjectMover: Generative Object Movement with Video Prior | Mar 11, 2025 | Multi-Task LearningObject | —Unverified | 0 |
| WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation | Mar 11, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| DreamRelation: Relation-Centric Video Customization | Mar 10, 2025 | RelationTriplet | —Unverified | 0 |
| TR-DQ: Time-Rotation Diffusion Quantization | Mar 9, 2025 | Image GenerationQuantization | —Unverified | 0 |
| Generative Video Bi-flow | Mar 9, 2025 | Unconditional Video GenerationVideo Generation | CodeCode Available | 0 |
| VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation | Mar 9, 2025 | Video Generation | CodeCode Available | 0 |
| Text2Story: Advancing Video Storytelling with Text Guidance | Mar 8, 2025 | FormImage Generation | —Unverified | 0 |
| Object-Centric World Model for Language-Guided Manipulation | Mar 8, 2025 | Autonomous Drivingmodel | —Unverified | 0 |
| VACT: A Video Automatic Causal Testing System and a Benchmark | Mar 8, 2025 | Large Language ModelVideo Generation | —Unverified | 0 |
| GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation | Mar 8, 2025 | 3D GenerationDecoder | —Unverified | 0 |
| MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice | Mar 7, 2025 | DenoisingPortrait Animation | —Unverified | 0 |
| FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video | Mar 6, 2025 | Future predictionNovel View Synthesis | —Unverified | 0 |
| Rethinking Video Super-Resolution: Towards Diffusion-Based Methods without Motion Alignment | Mar 5, 2025 | AllSuper-Resolution | —Unverified | 0 |
| High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights | Mar 5, 2025 | Video Generation | CodeCode Available | 0 |
| VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation | Mar 3, 2025 | Text-to-Video GenerationVideo Generation | CodeCode Available | 0 |
| HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models | Feb 28, 2025 | Action UnderstandingText-to-Video Generation | —Unverified | 0 |
| Unified Video Action Model | Feb 28, 2025 | modelPrediction | —Unverified | 0 |
| FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute | Feb 27, 2025 | DenoisingImage Generation | —Unverified | 0 |
| Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis | Feb 26, 2025 | Video Generation | —Unverified | 0 |
| ASurvey: Spatiotemporal Consistency in Video Generation | Feb 25, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| X-Dancer: Expressive Music to Human Dance Video Generation | Feb 24, 2025 | Image AnimationVideo Generation | —Unverified | 0 |
| VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing | Feb 24, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers | Feb 21, 2025 | Video Generation | —Unverified | 0 |
| Designing Parameter and Compute Efficient Diffusion Transformers using Distillation | Feb 20, 2025 | Knowledge DistillationNVIDIA Jetson Orin Nano | —Unverified | 0 |
| Hardware-Friendly Static Quantization Method for Video Diffusion Transformers | Feb 20, 2025 | QuantizationVideo Generation | —Unverified | 0 |
| RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers | Feb 20, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |