SOTAVerified

Video Generation

( Various Video Generation Tasks. Gif credit: MaGViT )

Papers

Showing 601650 of 1466 papers

TitleStatusHype
Controllable Video Generation With Sparse Trajectories0
Audio-driven Gesture Generation via Deviation Feature in the Latent Space0
FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation0
FFA Sora, video generation as fundus fluorescein angiography simulator0
Controllable Video Generation through Global and Local Motion Dynamics0
Controllable Video Generation by Learning the Underlying Dynamical System with Neural ODE0
Controllable Longer Image Animation with Diffusion Models0
FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing0
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions0
Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation0
Audio-Driven Co-Speech Gesture Video Generation0
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality0
Fast Autoregressive Video Generation with Diagonal Decoding0
Contrastive Video Textures0
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers0
Fashion-VDM: Video Diffusion Model for Virtual Try-On0
Continuous-Time Video Generation via Learning Motion Dynamics with Neural ODE0
Continuously Controllable Facial Expression Editing in Talking Face Videos0
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models0
ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction0
Face Consistency Benchmark for GenAI Video0
Facial Expression Video Generation Based-On Spatio-temporal Convolutional GAN: FEV-GAN0
Face Video Generation from a Single Image and Landmarks0
Contextual RNN-GANs for Abstract Reasoning Diagram Generation0
FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset0
FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability0
Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation0
Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation0
Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis0
Context-aware Talking Face Video Generation0
AtomoVideo: High Fidelity Image-to-Video Generation0
Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method0
ContentV: Efficient Training of Video Generation Models with Limited Compute0
AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports0
Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training0
Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey0
Exploring the Hyperparameter Space of Image Diffusion Models for Echocardiogram Generation0
ATI: Any Trajectory Instruction for Controllable Video Generation0
Explorative Inbetweening of Time and Space0
Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models0
Explaining Vision and Language through Graphs of Events in Space and Time0
Every Smile is Unique: Landmark-Guided Diverse Smile Generation0
AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations0
3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors0
Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation0
Every Image Listens, Every Image Dances: Music-Driven Image Animation0
Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign Language Video0
CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion0
Event-based High Dynamic Range Image and Very High Frame Rate Video Generation using Conditional Generative Adversarial Networks0
ASurvey: Spatiotemporal Consistency in Video Generation0
Show:102550
← PrevPage 13 of 30Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MCVDFVD162,460Unverified
2VDMFVD161,396Unverified
3TGAN-v2 (128x128)FVD161,209Unverified
4MCVD (64x64)FVD161,143Unverified
5MoCoGAN-HD (256x256, unconditional)FVD16700Unverified
6MagicVideo (256x256, text-conditional)FVD16699Unverified
7TATS (256x256)FVD16635Unverified
8FIFO-DiffusionFVD128596.64Unverified
9DIGAN (128x128, unconditional)FVD16577Unverified
10LVDM (256x256, unconditional)FVD16552Unverified
#ModelMetricClaimedVerifiedStatus
1MoCoGANFVD score503Unverified
2Baseline (from LVT)FVD score320.9Unverified
3SVG-FP (from FVD)FVD score315.5Unverified
4CDNA (from FVD)FVD score296.5Unverified
5SV2P (from FVD)FVD score262.5Unverified
6SVG-LP (from vRNN)FVD score256.62Unverified
7WAMFVD score159.6Unverified
8VRNN 1LFVD score149.22Unverified
9SAVP (from vRNN)FVD score143.43Unverified
10Hier-VRNNFVD score143.4Unverified
#ModelMetricClaimedVerifiedStatus
1MoCoGAN-HD (128x128)FVD 16183.6Unverified
2TATS (128x128)FVD 16132.6Unverified
3Long-video GAN (256x256)FVD 16116.5Unverified
4DIGAN (128x128)FVD 16114.6Unverified
5Long-video GAN (128x128)FVD 16107.5Unverified
6LVDM (256x256)FVD 1695.2Unverified
7DDMIFVD 1666.25Unverified
8Latte + LeanVAEFVD 1649.59Unverified
9StyleSV (256x256)FVD 1649Unverified
#ModelMetricClaimedVerifiedStatus
1Video Diffusion ModelInception Score57Unverified
2TGAN-ODEInception Score15.2Unverified
3TGAN-FInception Score13.62Unverified
4MoCoGANInception Score12.42Unverified
5MoCoGAN-MDPInception Score11.86Unverified
6TGAN-SVCInception Score11.85Unverified
7VGANInception Score8.18Unverified
#ModelMetricClaimedVerifiedStatus
1TGAN-FInception Score22.91Unverified
2TGANv2Inception Score21.45Unverified
3TGANv2-ODEInception Score21.02Unverified
4MoCoGANInception Score12.42Unverified
5MoCoGAN-MDPInception Score11.86Unverified
6TGAN-SVCInception Score11.85Unverified
7VGANInception Score8.18Unverified
#ModelMetricClaimedVerifiedStatus
1Imagen original (constant=6)CLIP R-Precision92.12Unverified
2Imagen fully distilled (oscillate (15,1))CLIP R-Precision90.97Unverified
3Imagen distilled (constant=6)CLIP R-Precision90.88Unverified
4Imagen original (oscillate(15,1))CLIP R-Precision89.91Unverified
5Imagen fully distilled (constant=6)CLIP R-Precision89.68Unverified
6Imagen distilled (oscillate (15,1))CLIP R-Precision88.78Unverified
#ModelMetricClaimedVerifiedStatus
1DIGAN (256x256)FVD16156.7Unverified
2MoCoGAN-HD (128x128)FVD16144.7Unverified
3DIGAN (128x128)FVD16128.1Unverified
4LVDM (256x256)FVD1699Unverified
5TATS (128x128)FVD1694.6Unverified
6StyleSV (256x256)FVD1682.6Unverified
#ModelMetricClaimedVerifiedStatus
1TGANv2 (2020)Inception Score28.87Unverified
2DVD-GANInception Score27.38Unverified
3VideoGPTInception Score24.69Unverified
4TGANv2Inception Score24.34Unverified
5TGAN-FInception Score22.91Unverified
6TGANv2-ODEInception Score21.02Unverified
#ModelMetricClaimedVerifiedStatus
1DVD-GANFVD31.1Unverified
2MAGVITFVD9.9Unverified
#ModelMetricClaimedVerifiedStatus
1INR-VFVD16144Unverified
#ModelMetricClaimedVerifiedStatus
1DVD-GANFID2.16Unverified
#ModelMetricClaimedVerifiedStatus
1DVD-GANFID12.92Unverified
#ModelMetricClaimedVerifiedStatus
1DiT-XL/2 + CVAE-FT-SEFID8.59Unverified
#ModelMetricClaimedVerifiedStatus
1VideoAssembler (Zero-Shot, 256x256, class-conditional)FVD16252Unverified
#ModelMetricClaimedVerifiedStatus
1PG-SWGAN-3DFID404.1Unverified
#ModelMetricClaimedVerifiedStatus
1StyleSVFVD16207.2Unverified