| Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM | Dec 19, 2024 | Video Generation | CodeCode Available | 1 |
| AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation | Dec 19, 2024 | Video GenerationVideo Synchronization | —Unverified | 0 |
| Parallelized Autoregressive Visual Generation | Dec 19, 2024 | Video Generation | —Unverified | 0 |
| FlexCache: Flexible Approximate Cache System for Video Diffusion | Dec 18, 2024 | Video Generation | —Unverified | 0 |
| ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping | Dec 18, 2024 | ObjectVideo Generation | —Unverified | 0 |
| Autoregressive Video Generation without Vector Quantization | Dec 18, 2024 | Image GenerationPrediction | CodeCode Available | 4 |
| Real-time One-Step Diffusion-based Expressive Portrait Videos Generation | Dec 18, 2024 | Video Generation | CodeCode Available | 1 |
| SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation | Dec 18, 2024 | Optical Flow EstimationVideo Generation | —Unverified | 0 |
| VideoDPO: Omni-Preference Alignment for Video Diffusion Generation | Dec 18, 2024 | Image GenerationText-to-Video Generation | —Unverified | 0 |
| AKiRa: Augmentation Kit on Rays for optical video generation | Dec 18, 2024 | Video Generation | —Unverified | 0 |
| Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation | Dec 17, 2024 | Story CompletionVideo Generation | —Unverified | 0 |
| CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices | Dec 17, 2024 | Action RecognitionMotion Estimation | —Unverified | 0 |
| MotionBridge: Dynamic Video Inbetweening with Flexible Controls | Dec 17, 2024 | Video EditingVideo Generation | —Unverified | 0 |
| VidTok: A Versatile and Open-Source Video Tokenizer | Dec 17, 2024 | QuantizationSSIM | CodeCode Available | 3 |
| Can video generation replace cinematographers? Research on the cinematic language of generated video | Dec 16, 2024 | Video Generation | —Unverified | 0 |
| VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting | Dec 16, 2024 | InformativenessLarge Language Model | CodeCode Available | 0 |
| InterDyn: Controllable Interactive Dynamics with Video Diffusion Models | Dec 16, 2024 | Video Generation | —Unverified | 0 |
| Generative Inbetweening through Frame-wise Conditions-Driven Video Generation | Dec 16, 2024 | Video Generation | CodeCode Available | 2 |
| DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes | Dec 15, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| GenLit: Reformulating Single-Image Relighting as Video Generation | Dec 15, 2024 | Image GenerationImage Relighting | —Unverified | 0 |
| Video Diffusion Transformers are In-Context Learners | Dec 14, 2024 | Video Generation | CodeCode Available | 1 |
| SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device | Dec 13, 2024 | DenoisingImage Generation | —Unverified | 0 |
| TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation | Dec 13, 2024 | Image to Video GenerationObject | —Unverified | 0 |
| MSC: Multi-Scale Spatio-Temporal Causal Attention for Autoregressive Video Diffusion | Dec 13, 2024 | Video Generation | —Unverified | 0 |
| AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era | Dec 13, 2024 | Image to Video GenerationVideo Generation | CodeCode Available | 7 |
| LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity | Dec 13, 2024 | GPUMamba | —Unverified | 0 |
| Enhancing Facial Consistency in Conditional Video Generation via Facial Landmark Transformation | Dec 12, 2024 | Video Generation | —Unverified | 0 |
| Mojito: Motion Trajectory and Intensity Control for Video Generation | Dec 12, 2024 | Computational EfficiencyOptical Flow Estimation | —Unverified | 0 |
| Video Creation by Demonstration | Dec 12, 2024 | Video Generation | —Unverified | 0 |
| OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation | Dec 12, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| Owl-1: Omni World Model for Consistent Long Video Generation | Dec 12, 2024 | Video Generation | CodeCode Available | 2 |
| InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption | Dec 12, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors | Dec 12, 2024 | 3D ReconstructionImage to 3D | —Unverified | 0 |
| T-SVG: Text-Driven Stereoscopic Video Generation | Dec 12, 2024 | Depth EstimationText-to-Video Generation | —Unverified | 0 |
| UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer | Dec 12, 2024 | Video Generation | CodeCode Available | 0 |
| Doe-1: Closed-Loop Autonomous Driving with Large World Model | Dec 12, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 2 |
| SweetTokenizer: Semantic-Aware Spatial-Temporal Tokenizer for Compact Visual Discretization | Dec 11, 2024 | Image ReconstructionRepresentation Learning | —Unverified | 0 |
| FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model | Dec 11, 2024 | Representation LearningVideo Generation | —Unverified | 0 |
| Physical Informed Driving World Model | Dec 11, 2024 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models | Dec 10, 2024 | Video Generation | CodeCode Available | 2 |
| StyleMaster: Stylize Your Video with Artistic Generation and Translation | Dec 10, 2024 | Contrastive LearningStyle Transfer | —Unverified | 0 |
| From Slow Bidirectional to Fast Autoregressive Video Diffusion Models | Dec 10, 2024 | GPUVideo Generation | —Unverified | 0 |
| 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation | Dec 10, 2024 | Video Generation | —Unverified | 0 |
| ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer | Dec 10, 2024 | DenoisingImage Generation | CodeCode Available | 1 |
| STIV: Scalable Text and Image Conditioned Video Generation | Dec 10, 2024 | Video GenerationVideo Prediction | —Unverified | 0 |
| UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics | Dec 10, 2024 | Image GenerationVideo Generation | —Unverified | 0 |
| Multi-Shot Character Consistency for Text-to-Video Generation | Dec 10, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints | Dec 10, 2024 | 4D reconstructionVideo Generation | CodeCode Available | 4 |
| SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations | Dec 9, 2024 | Video Generation | —Unverified | 0 |
| FlexDiT: Dynamic Token Density Control for Diffusion Transformer | Dec 8, 2024 | Computational EfficiencyDenoising | CodeCode Available | 1 |