SOTAVerified

Image to Video Generation

Image to Video Generation refers to the task of generating a sequence of video frames based on a single still image or a set of still images. The goal is to produce a video that is coherent and consistent in terms of appearance, motion, and style, while also being temporally consistent, meaning that the generated video should look like a coherent sequence of frames that are temporally ordered. This task is typically tackled using deep generative models, such as Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs), that are trained on large datasets of videos. The models learn to generate plausible video frames that are conditioned on the input image, as well as on any other auxiliary information, such as a sound or text track.

Papers

Showing 150 of 85 papers

TitleStatusHype
Open-Sora: Democratizing Efficient Video Production for AllCode13
LTX-Video: Realtime Video Latent DiffusionCode9
Champ: Controllable and Consistent Human Image Animation with 3D Parametric GuidanceCode7
AniSora: Exploring the Frontiers of Animation Video Generation in the Sora EraCode7
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human AnimationCode7
A Method for Animating Children's Drawings of the Human FigureCode6
Mora: Enabling Generalist Video Generation via A Multi-Agent FrameworkCode5
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative ModelsCode5
AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing TasksCode4
Follow-Your-Click: Open-domain Regional Image Animation via Short PromptsCode4
Identity-Preserving Text-to-Video Generation by Frequency DecompositionCode4
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to AdvancesCode3
ConsistI2V: Enhancing Visual Consistency for Image-to-Video GenerationCode3
FramePainter: Endowing Interactive Image Editing with Video Diffusion PriorsCode3
PhysGen: Rigid-Body Physics-Grounded Image-to-Video GenerationCode3
Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation ModelCode3
Every Painting Awakened: A Training-free Framework for Painting-to-Animation GenerationCode2
Conditional Image-to-Video Generation with Latent Flow Diffusion ModelsCode2
Collaborative Neural Rendering using Anime Character SheetsCode2
SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance FieldsCode2
Kandinsky 3.0 Technical ReportCode2
AnimateAnything: Fine-Grained Open Domain Image Animation with Motion GuidanceCode2
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion ModelsCode2
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep ApproachCode1
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward GuidanceCode1
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You ThinkCode1
Lifespan Age Transformation SynthesisCode1
Make It Move: Controllable Image-to-Video Generation with Text DescriptionsCode1
MMTrail: A Multimodal Trailer Video Dataset with Language and Music DescriptionsCode1
MVOC: a training-free multiple video object composition method with diffusion modelsCode1
Object-Centric Image to Video Generation with Language GuidanceCode1
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video GenerationCode1
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and EditingCode1
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality DataCode0
Magic 1-For-1: Generating One Minute Video Clips within One MinuteCode0
Video Generation from Single Semantic Label MapCode0
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large DatasetsCode0
GenRec: Unifying Video Generation and Recognition with Diffusion ModelsCode0
Learning to Forecast and Refine Residual Motion for Image-to-Video GenerationCode0
DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance0
Towards Physically Plausible Video Generation via VLM Planning0
LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer0
Dreamix: Video Diffusion Models are General Video Editors0
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance0
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models0
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale0
MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition0
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation0
Decouple Content and Motion for Conditional Image-to-Video Generation0
MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.