SOTAVerified

Video Generation

( Various Video Generation Tasks. Gif credit: MaGViT )

Papers

Showing 851900 of 1466 papers

TitleStatusHype
Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation0
Contextual RNN-GANs for Abstract Reasoning Diagram Generation0
Continuously Controllable Facial Expression Editing in Talking Face Videos0
Continuous-Time Video Generation via Learning Motion Dynamics with Neural ODE0
Contrastive Video Textures0
Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation0
Controllable Longer Image Animation with Diffusion Models0
Controllable Video Generation by Learning the Underlying Dynamical System with Neural ODE0
Controllable Video Generation through Global and Local Motion Dynamics0
Controllable Video Generation With Sparse Trajectories0
Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions0
Copy Motion From One to Another: Fake Motion Video Generation0
Co-Speech Gesture Video Generation with Implicit Motion-Audio Entanglement0
CPA: Camera-pose-awareness Diffusion Transformer for Video Generation0
Cross-Modal Learning for Music-to-Music-Video Description Generation0
Cross-View Exocentric to Egocentric Video Synthesis0
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model0
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes0
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models0
NewMove: Customizing text-to-video models with novel motions0
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects0
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers0
CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention0
Dance Any Beat: Blending Beats with Visuals in Dance Video Generation0
DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models0
Decouple Content and Motion for Conditional Image-to-Video Generation0
DeepHS-HDRVideo: Deep High Speed High Dynamic Range Video Reconstruction0
DeepRhythm: Exposing DeepFakes with Attentional Visual Heartbeat Rhythms0
DeepVerse: 4D Autoregressive Video Generation as a World Model0
Deep Video Generation, Prediction and Completion of Human Action Sequences0
Denoising Diffusion Probabilistic Models in Six Simple Steps0
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation0
Designing Parameter and Compute Efficient Diffusion Transformers using Distillation0
DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing0
Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling0
DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation0
DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models0
DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation0
DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures0
Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation0
Diffusion Adversarial Post-Training for One-Step Video Generation0
Diffusion-based Realistic Listening Head Generation via Hybrid Motion Modeling0
Diffusion Models for Robotic Manipulation: A Survey0
Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data0
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion0
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion0
DirectorLLM for Human-Centric Video Generation0
DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control0
Disentangled Recurrent Wasserstein Autoencoder0
Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation0
Show:102550
← PrevPage 18 of 30Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MCVDFVD162,460Unverified
2VDMFVD161,396Unverified
3TGAN-v2 (128x128)FVD161,209Unverified
4MCVD (64x64)FVD161,143Unverified
5MoCoGAN-HD (256x256, unconditional)FVD16700Unverified
6MagicVideo (256x256, text-conditional)FVD16699Unverified
7TATS (256x256)FVD16635Unverified
8FIFO-DiffusionFVD128596.64Unverified
9DIGAN (128x128, unconditional)FVD16577Unverified
10LVDM (256x256, unconditional)FVD16552Unverified
#ModelMetricClaimedVerifiedStatus
1MoCoGANFVD score503Unverified
2Baseline (from LVT)FVD score320.9Unverified
3SVG-FP (from FVD)FVD score315.5Unverified
4CDNA (from FVD)FVD score296.5Unverified
5SV2P (from FVD)FVD score262.5Unverified
6SVG-LP (from vRNN)FVD score256.62Unverified
7WAMFVD score159.6Unverified
8VRNN 1LFVD score149.22Unverified
9SAVP (from vRNN)FVD score143.43Unverified
10Hier-VRNNFVD score143.4Unverified
#ModelMetricClaimedVerifiedStatus
1MoCoGAN-HD (128x128)FVD 16183.6Unverified
2TATS (128x128)FVD 16132.6Unverified
3Long-video GAN (256x256)FVD 16116.5Unverified
4DIGAN (128x128)FVD 16114.6Unverified
5Long-video GAN (128x128)FVD 16107.5Unverified
6LVDM (256x256)FVD 1695.2Unverified
7DDMIFVD 1666.25Unverified
8Latte + LeanVAEFVD 1649.59Unverified
9StyleSV (256x256)FVD 1649Unverified
#ModelMetricClaimedVerifiedStatus
1Video Diffusion ModelInception Score57Unverified
2TGAN-ODEInception Score15.2Unverified
3TGAN-FInception Score13.62Unverified
4MoCoGANInception Score12.42Unverified
5MoCoGAN-MDPInception Score11.86Unverified
6TGAN-SVCInception Score11.85Unverified
7VGANInception Score8.18Unverified
#ModelMetricClaimedVerifiedStatus
1TGAN-FInception Score22.91Unverified
2TGANv2Inception Score21.45Unverified
3TGANv2-ODEInception Score21.02Unverified
4MoCoGANInception Score12.42Unverified
5MoCoGAN-MDPInception Score11.86Unverified
6TGAN-SVCInception Score11.85Unverified
7VGANInception Score8.18Unverified
#ModelMetricClaimedVerifiedStatus
1Imagen original (constant=6)CLIP R-Precision92.12Unverified
2Imagen fully distilled (oscillate (15,1))CLIP R-Precision90.97Unverified
3Imagen distilled (constant=6)CLIP R-Precision90.88Unverified
4Imagen original (oscillate(15,1))CLIP R-Precision89.91Unverified
5Imagen fully distilled (constant=6)CLIP R-Precision89.68Unverified
6Imagen distilled (oscillate (15,1))CLIP R-Precision88.78Unverified
#ModelMetricClaimedVerifiedStatus
1DIGAN (256x256)FVD16156.7Unverified
2MoCoGAN-HD (128x128)FVD16144.7Unverified
3DIGAN (128x128)FVD16128.1Unverified
4LVDM (256x256)FVD1699Unverified
5TATS (128x128)FVD1694.6Unverified
6StyleSV (256x256)FVD1682.6Unverified
#ModelMetricClaimedVerifiedStatus
1TGANv2 (2020)Inception Score28.87Unverified
2DVD-GANInception Score27.38Unverified
3VideoGPTInception Score24.69Unverified
4TGANv2Inception Score24.34Unverified
5TGAN-FInception Score22.91Unverified
6TGANv2-ODEInception Score21.02Unverified
#ModelMetricClaimedVerifiedStatus
1DVD-GANFVD31.1Unverified
2MAGVITFVD9.9Unverified
#ModelMetricClaimedVerifiedStatus
1INR-VFVD16144Unverified
#ModelMetricClaimedVerifiedStatus
1DVD-GANFID2.16Unverified
#ModelMetricClaimedVerifiedStatus
1DVD-GANFID12.92Unverified
#ModelMetricClaimedVerifiedStatus
1DiT-XL/2 + CVAE-FT-SEFID8.59Unverified
#ModelMetricClaimedVerifiedStatus
1VideoAssembler (Zero-Shot, 256x256, class-conditional)FVD16252Unverified
#ModelMetricClaimedVerifiedStatus
1PG-SWGAN-3DFID404.1Unverified
#ModelMetricClaimedVerifiedStatus
1StyleSVFVD16207.2Unverified