SOTAVerified

Video Generation

( Various Video Generation Tasks. Gif credit: MaGViT )

Papers

Showing 10511100 of 1466 papers

TitleStatusHype
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion0
DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance0
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation0
DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving0
DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion0
DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation0
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers0
Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis0
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization0
Dual-Stream Diffusion Net for Text-to-Video Generation0
DualX-VSR: Dual Axial SpatialTemporal Transformer for Real-World Video Super-Resolution without Motion Compensation0
Dynamic Camera Poses and Where to Find Them0
Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM0
Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions0
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes0
DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation0
E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors0
EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation0
EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model0
Echocardiography video synthesis from end diastolic semantic map via diffusion model0
EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation0
EEG to fMRI Synthesis: Is Deep Learning a candidate?0
Efficient training for future video generation based on hierarchical disentangled representation of latent variables0
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition0
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation0
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation0
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation0
EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions0
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs0
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning0
Enabling Versatile Controls for Video Diffusion Models0
Enabling Visual Composition and Animation in Unsupervised Video Generation0
Endora: Video Generation Models as Endoscopy Simulators0
Enhancing Facial Consistency in Conditional Video Generation via Facial Landmark Transformation0
Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory0
EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis0
EVA: An Embodied World Model for Future Video Anticipation0
Evaluating Robot Policies in a World Model0
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation0
Event-based High Dynamic Range Image and Very High Frame Rate Video Generation using Conditional Generative Adversarial Networks0
Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign Language Video0
Every Image Listens, Every Image Dances: Music-Driven Image Animation0
Every Smile is Unique: Landmark-Guided Diverse Smile Generation0
Explaining Vision and Language through Graphs of Events in Space and Time0
Explorative Inbetweening of Time and Space0
Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation0
Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey0
Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method0
Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis0
FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability0
Show:102550
← PrevPage 22 of 30Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MCVDFVD162,460Unverified
2VDMFVD161,396Unverified
3TGAN-v2 (128x128)FVD161,209Unverified
4MCVD (64x64)FVD161,143Unverified
5MoCoGAN-HD (256x256, unconditional)FVD16700Unverified
6MagicVideo (256x256, text-conditional)FVD16699Unverified
7TATS (256x256)FVD16635Unverified
8FIFO-DiffusionFVD128596.64Unverified
9DIGAN (128x128, unconditional)FVD16577Unverified
10LVDM (256x256, unconditional)FVD16552Unverified
#ModelMetricClaimedVerifiedStatus
1MoCoGANFVD score503Unverified
2Baseline (from LVT)FVD score320.9Unverified
3SVG-FP (from FVD)FVD score315.5Unverified
4CDNA (from FVD)FVD score296.5Unverified
5SV2P (from FVD)FVD score262.5Unverified
6SVG-LP (from vRNN)FVD score256.62Unverified
7WAMFVD score159.6Unverified
8VRNN 1LFVD score149.22Unverified
9SAVP (from vRNN)FVD score143.43Unverified
10Hier-VRNNFVD score143.4Unverified
#ModelMetricClaimedVerifiedStatus
1MoCoGAN-HD (128x128)FVD 16183.6Unverified
2TATS (128x128)FVD 16132.6Unverified
3Long-video GAN (256x256)FVD 16116.5Unverified
4DIGAN (128x128)FVD 16114.6Unverified
5Long-video GAN (128x128)FVD 16107.5Unverified
6LVDM (256x256)FVD 1695.2Unverified
7DDMIFVD 1666.25Unverified
8Latte + LeanVAEFVD 1649.59Unverified
9StyleSV (256x256)FVD 1649Unverified
#ModelMetricClaimedVerifiedStatus
1Video Diffusion ModelInception Score57Unverified
2TGAN-ODEInception Score15.2Unverified
3TGAN-FInception Score13.62Unverified
4MoCoGANInception Score12.42Unverified
5MoCoGAN-MDPInception Score11.86Unverified
6TGAN-SVCInception Score11.85Unverified
7VGANInception Score8.18Unverified
#ModelMetricClaimedVerifiedStatus
1TGAN-FInception Score22.91Unverified
2TGANv2Inception Score21.45Unverified
3TGANv2-ODEInception Score21.02Unverified
4MoCoGANInception Score12.42Unverified
5MoCoGAN-MDPInception Score11.86Unverified
6TGAN-SVCInception Score11.85Unverified
7VGANInception Score8.18Unverified
#ModelMetricClaimedVerifiedStatus
1Imagen original (constant=6)CLIP R-Precision92.12Unverified
2Imagen fully distilled (oscillate (15,1))CLIP R-Precision90.97Unverified
3Imagen distilled (constant=6)CLIP R-Precision90.88Unverified
4Imagen original (oscillate(15,1))CLIP R-Precision89.91Unverified
5Imagen fully distilled (constant=6)CLIP R-Precision89.68Unverified
6Imagen distilled (oscillate (15,1))CLIP R-Precision88.78Unverified
#ModelMetricClaimedVerifiedStatus
1DIGAN (256x256)FVD16156.7Unverified
2MoCoGAN-HD (128x128)FVD16144.7Unverified
3DIGAN (128x128)FVD16128.1Unverified
4LVDM (256x256)FVD1699Unverified
5TATS (128x128)FVD1694.6Unverified
6StyleSV (256x256)FVD1682.6Unverified
#ModelMetricClaimedVerifiedStatus
1TGANv2 (2020)Inception Score28.87Unverified
2DVD-GANInception Score27.38Unverified
3VideoGPTInception Score24.69Unverified
4TGANv2Inception Score24.34Unverified
5TGAN-FInception Score22.91Unverified
6TGANv2-ODEInception Score21.02Unverified
#ModelMetricClaimedVerifiedStatus
1DVD-GANFVD31.1Unverified
2MAGVITFVD9.9Unverified
#ModelMetricClaimedVerifiedStatus
1INR-VFVD16144Unverified
#ModelMetricClaimedVerifiedStatus
1DVD-GANFID2.16Unverified
#ModelMetricClaimedVerifiedStatus
1DVD-GANFID12.92Unverified
#ModelMetricClaimedVerifiedStatus
1DiT-XL/2 + CVAE-FT-SEFID8.59Unverified
#ModelMetricClaimedVerifiedStatus
1VideoAssembler (Zero-Shot, 256x256, class-conditional)FVD16252Unverified
#ModelMetricClaimedVerifiedStatus
1PG-SWGAN-3DFID404.1Unverified
#ModelMetricClaimedVerifiedStatus
1StyleSVFVD16207.2Unverified