| FlexDiT: Dynamic Token Density Control for Diffusion Transformer | Dec 8, 2024 | Computational EfficiencyDenoising | CodeCode Available | 1 |
| Accelerating Video Diffusion Models via Distribution Matching | Dec 8, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation | Dec 8, 2024 | Contrastive LearningImage to Video Generation | —Unverified | 0 |
| Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation | Dec 8, 2024 | Point TrackingVideo Generation | —Unverified | 0 |
| Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model | Dec 6, 2024 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 2 |
| UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving | Dec 6, 2024 | Autonomous DrivingDiversity | CodeCode Available | 3 |
| Mind the Time: Temporally-Controlled Multi-Event Video Generation | Dec 6, 2024 | Video Generation | —Unverified | 0 |
| PaintScene4D: Consistent 4D Scene Generation from Text Prompts | Dec 5, 2024 | Scene GenerationVideo Generation | —Unverified | 0 |
| Movie Gen: SWOT Analysis of Meta's Generative AI Foundation Model for Transforming Media Generation, Advertising, and Entertainment Industries | Dec 5, 2024 | Video Generation | —Unverified | 0 |
| Instructional Video Generation | Dec 5, 2024 | Video Generation | —Unverified | 0 |
| MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation | Dec 5, 2024 | Portrait AnimationVideo Generation | —Unverified | 0 |
| Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation | Dec 5, 2024 | Image ComprehensionRepresentation Learning | CodeCode Available | 2 |
| IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation | Dec 5, 2024 | DisentanglementTalking Head Generation | —Unverified | 0 |
| DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models | Dec 5, 2024 | Temporal SequencesVideo Generation | —Unverified | 0 |
| GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration | Dec 5, 2024 | AttributeHallucination | —Unverified | 0 |
| Imagine360: Immersive 360 Video Generation from Perspective Anchor | Dec 4, 2024 | DenoisingVideo Denoising | —Unverified | 0 |
| SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model | Dec 4, 2024 | Video Generation | —Unverified | 0 |
| Mimir: Improving Video Diffusion Models for Precise Text Understanding | Dec 4, 2024 | DecoderReading Comprehension | —Unverified | 0 |
| Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention | Dec 4, 2024 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| Advancing Auto-Regressive Continuation for Video Frames | Dec 4, 2024 | Autonomous DrivingOptical Flow Estimation | —Unverified | 0 |
| Navigation World Models | Dec 4, 2024 | Robot NavigationVideo Generation | CodeCode Available | 4 |
| HunyuanVideo: A Systematic Framework For Large Video Generative Models | Dec 3, 2024 | Video AlignmentVideo Generation | CodeCode Available | 11 |
| Motion Prompting: Controlling Video Generation with Motion Trajectories | Dec 3, 2024 | Video Generation | —Unverified | 0 |
| AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction | Dec 3, 2024 | 3D ReconstructionVideo Generation | —Unverified | 0 |
| VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation | Dec 3, 2024 | Script GenerationVideo Generation | CodeCode Available | 2 |
| Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | Dec 3, 2024 | ObjectOffline RL | —Unverified | 0 |
| OmniCreator: Self-Supervised Unified Generation with Universal Editing | Dec 3, 2024 | DenoisingSemantic correspondence | —Unverified | 0 |
| FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait | Dec 2, 2024 | Image AnimationVideo Generation | —Unverified | 0 |
| CPA: Camera-pose-awareness Diffusion Transformer for Video Generation | Dec 2, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models | Dec 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| World-consistent Video Diffusion with Explicit 3D Modeling | Dec 2, 2024 | 3D GenerationImage Generation | —Unverified | 0 |
| InfinityDrive: Breaking Time Limits in Driving World Models | Dec 2, 2024 | Autonomous DrivingDiversity | —Unverified | 0 |
| Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation | Dec 2, 2024 | DiversityVideo Generation | —Unverified | 0 |
| Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation | Dec 1, 2024 | Video Generation | —Unverified | 0 |
| DIVD: Deblurring with Improved Video Diffusion Model | Dec 1, 2024 | Deblurringmodel | —Unverified | 0 |
| PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation | Nov 30, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 2 |
| Human Action CLIPs: Detecting AI-generated Human Motion | Nov 30, 2024 | Video Generation | —Unverified | 0 |
| Motion Dreamer: Realizing Physically Coherent Video Generation through Scene-Aware Motion Reasoning | Nov 30, 2024 | Autonomous DrivingMotion Generation | CodeCode Available | 0 |
| Motion Modes: What Could Happen Next? | Nov 29, 2024 | DiversityObject | —Unverified | 0 |
| Fleximo: Towards Flexible Text-to-Human Motion Video Generation | Nov 29, 2024 | Image to Video GenerationLarge Language Model | —Unverified | 0 |
| Open-Sora Plan: Open-Source Large Video Generation Model | Nov 28, 2024 | Video Generation | CodeCode Available | 11 |
| OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation | Nov 28, 2024 | Video Generation | —Unverified | 0 |
| Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model | Nov 28, 2024 | DenoisingVideo Generation | CodeCode Available | 1 |
| SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing | Nov 28, 2024 | Intent RecognitionModel Selection | —Unverified | 0 |
| Trajectory Attention for Fine-grained Video Motion Control | Nov 28, 2024 | Inductive BiasVideo Editing | —Unverified | 0 |
| MSG score: A Comprehensive Evaluation for Multi-Scene Video Generation | Nov 28, 2024 | Video Generation | —Unverified | 0 |
| Towards Chunk-Wise Generation for Long Videos | Nov 27, 2024 | DenoisingGPU | —Unverified | 0 |
| AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers | Nov 27, 2024 | Camera Pose EstimationPose Estimation | —Unverified | 0 |
| MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation | Nov 27, 2024 | AttributeVideo Generation | —Unverified | 0 |
| Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models | Nov 27, 2024 | Model CompressionVideo Generation | —Unverified | 0 |