SOTAVerified

Image to Video Generation

Image to Video Generation refers to the task of generating a sequence of video frames based on a single still image or a set of still images. The goal is to produce a video that is coherent and consistent in terms of appearance, motion, and style, while also being temporally consistent, meaning that the generated video should look like a coherent sequence of frames that are temporally ordered. This task is typically tackled using deep generative models, such as Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs), that are trained on large datasets of videos. The models learn to generate plausible video frames that are conditioned on the input image, as well as on any other auxiliary information, such as a sound or text track.

Papers

Showing 2130 of 85 papers

TitleStatusHype
Conditional Image-to-Video Generation with Latent Flow Diffusion ModelsCode2
SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance FieldsCode2
Collaborative Neural Rendering using Anime Character SheetsCode2
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward GuidanceCode1
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You ThinkCode1
Object-Centric Image to Video Generation with Language GuidanceCode1
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep ApproachCode1
MMTrail: A Multimodal Trailer Video Dataset with Language and Music DescriptionsCode1
MVOC: a training-free multiple video object composition method with diffusion modelsCode1
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video GenerationCode1
Show:102550
← PrevPage 3 of 9Next →

No leaderboard results yet.