| Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation | Mar 24, 2025 | Motion GenerationPortrait Animation | —Unverified | 0 |
| Can Text-to-Video Generation help Video-Language Alignment? | Mar 24, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation | Mar 24, 2025 | BenchmarkingData Augmentation | —Unverified | 0 |
| Aether: Geometric-Aware Unified World Modeling | Mar 24, 2025 | Dynamic ReconstructionPrediction | —Unverified | 0 |
| LongDiff: Training-Free Long Video Generation in One Go | Mar 23, 2025 | PositionVideo Generation | —Unverified | 0 |
| TransAnimate: Taming Layer Diffusion to Generate RGBA Video | Mar 23, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation | Mar 22, 2025 | Video Generation | —Unverified | 0 |
| Position: Interactive Generative Video as Next-Generation Game Engine | Mar 21, 2025 | PositionVideo Generation | —Unverified | 0 |
| Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer | Mar 21, 2025 | BenchmarkingVideo Generation | CodeCode Available | 2 |
| Enabling Versatile Controls for Video Diffusion Models | Mar 21, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks | Mar 21, 2025 | DenoisingOptical Flow Estimation | —Unverified | 0 |
| Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model | Mar 21, 2025 | DisentanglementHuman-Object Interaction Detection | —Unverified | 0 |
| MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving | Mar 20, 2025 | Autonomous DrivingDenoising | CodeCode Available | 1 |
| ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos | Mar 20, 2025 | DenoisingDiversity | —Unverified | 0 |
| PoseTraj: Pose-Aware Trajectory Control in Video Diffusion | Mar 20, 2025 | DisentanglementVideo Generation | —Unverified | 0 |
| VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling | Mar 20, 2025 | 3DGSText to 3D | —Unverified | 0 |
| XAttention: Block Sparse Attention with Antidiagonal Scoring | Mar 20, 2025 | Video GenerationVideo Understanding | CodeCode Available | 3 |
| MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance | Mar 20, 2025 | Image to Video GenerationObject | —Unverified | 0 |
| VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention | Mar 19, 2025 | Video Generation | —Unverified | 0 |
| Temporal Regularization Makes Your Video Generator Stronger | Mar 19, 2025 | DiversityVideo Generation | —Unverified | 0 |
| MusicInfuser: Making Video Diffusion Listen and Dance | Mar 18, 2025 | Video Generation | —Unverified | 0 |
| Fast Autoregressive Video Generation with Diagonal Decoding | Mar 18, 2025 | Video Generation | —Unverified | 0 |
| Concat-ID: Towards Universal Identity-Preserving Video Synthesis | Mar 18, 2025 | Human-Domain Subject-to-VideoVideo Generation | CodeCode Available | 2 |
| AIGVE-Tool: AI-Generated Video Evaluation Toolkit with Multifaceted Benchmark | Mar 18, 2025 | Video Generation | CodeCode Available | 1 |
| LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models | Mar 18, 2025 | compressed sensingVideo Generation | CodeCode Available | 2 |
| MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation | Mar 18, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| Impossible Videos | Mar 18, 2025 | counterfactualVideo Generation | —Unverified | 0 |
| AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations | Mar 17, 2025 | Semantic SegmentationVideo Generation | —Unverified | 0 |
| Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction | Mar 17, 2025 | Video GenerationVideo Prediction | CodeCode Available | 0 |
| EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis | Mar 16, 2025 | Accident AnticipationVideo Generation | —Unverified | 0 |
| SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs | Mar 16, 2025 | Semantic SegmentationVideo Generation | —Unverified | 0 |
| SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering | Mar 15, 2025 | Scene GenerationVideo Generation | CodeCode Available | 2 |
| Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model | Mar 14, 2025 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models | Mar 14, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| ReCamMaster: Camera-Controlled Generative Rendering from A Single Video | Mar 14, 2025 | Super-ResolutionVideo Generation | —Unverified | 0 |
| TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation | Mar 14, 2025 | Imitation LearningObject | —Unverified | 0 |
| Cross-Modal Learning for Music-to-Music-Video Description Generation | Mar 14, 2025 | Video DescriptionVideo Generation | —Unverified | 0 |
| VideoMerge: Towards Training-free Long Video Generation | Mar 13, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance | Mar 13, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| VMBench: A Benchmark for Perception-Aligned Video Motion Generation | Mar 13, 2025 | Motion GenerationVideo Generation | CodeCode Available | 2 |
| Long Context Tuning for Video Generation | Mar 13, 2025 | Video Generation | —Unverified | 0 |
| Semantic Latent Motion for Portrait Video Generation | Mar 13, 2025 | DescriptiveVideo Generation | —Unverified | 0 |
| Neighboring Autoregressive Modeling for Efficient Visual Generation | Mar 12, 2025 | Image GenerationText to Image Generation | CodeCode Available | 2 |
| On the Limitations of Vision-Language Models in Understanding Image Transforms | Mar 12, 2025 | Question AnsweringVideo Generation | —Unverified | 0 |
| Accelerating Diffusion Sampling via Exploiting Local Transition Coherence | Mar 12, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| LuciBot: Automated Robot Policy Learning from Generated Videos | Mar 12, 2025 | Video Generation | —Unverified | 0 |
| I2V3D: Controllable image-to-video generation with 3D guidance | Mar 12, 2025 | 3D geometryImage to Video Generation | —Unverified | 0 |
| PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop | Mar 12, 2025 | DiagnosticVideo Generation | CodeCode Available | 2 |
| Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space | Mar 12, 2025 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| Unified Dense Prediction of Video Diffusion | Mar 12, 2025 | PredictionVideo Generation | —Unverified | 0 |