SOTAVerified

Latent Video Diffusion Models for High-Fidelity Long Video Generation

2022-11-23Code Available2· sign in to hype

Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, Qifeng Chen

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

AI-generated content has attracted lots of attention recently, but photo-realistic video synthesis is still challenging. Although many attempts using GANs and autoregressive models have been made in this area, the visual quality and length of generated videos are far from satisfactory. Diffusion models have shown remarkable results recently but require significant computational resources. To address this, we introduce lightweight video diffusion models by leveraging a low-dimensional 3D latent space, significantly outperforming previous pixel-space video diffusion models under a limited computational budget. In addition, we propose hierarchical diffusion in the latent space such that longer videos with more than one thousand frames can be produced. To further overcome the performance degradation issue for long video generation, we propose conditional latent perturbation and unconditional guidance that effectively mitigate the accumulated errors during the extension of video length. Extensive experiments on small domain datasets of different categories suggest that our framework generates more realistic and longer videos than previous strong baselines. We additionally provide an extension to large-scale text-to-video generation to demonstrate the superiority of our work. Our code and models will be made publicly available.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
Sky Time-lapseMoCoGAN-HD (128x128)FVD 16183.6Unverified
Sky Time-lapseTATS (128x128)FVD 16132.6Unverified
Sky Time-lapseLong-video GAN (128x128)FVD 16107.5Unverified
Sky Time-lapseLong-video GAN (256x256)FVD 16116.5Unverified
Sky Time-lapseLVDM (256x256)FVD 1695.2Unverified
Sky Time-lapseDIGAN (128x128)FVD 16114.6Unverified
TaichiDIGAN (128x128)FVD16128.1Unverified
TaichiDIGAN (256x256)FVD16156.7Unverified
TaichiTATS (128x128)FVD1694.6Unverified
TaichiLVDM (256x256)FVD1699Unverified
TaichiMoCoGAN-HD (128x128)FVD16144.7Unverified
UCF-101LVDM (256x256, unconditional)FVD16372Unverified
UCF-101MCVDFVD162,460Unverified
UCF-101VDMFVD161,396Unverified
UCF-101TGAN-v2 (128x128)FVD161,209Unverified
UCF-101LVDM (256x256, unconditional)FVD16552Unverified

Reproductions