SOTAVerified

VarNet: Exploring Variations for Unsupervised Video Prediction

2018-10-01IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018Code Available0· sign in to hype

Beibei Jin, Yu Hu, Yiming Zeng, Qiankun Tang, Shice Liu, Jing Ye

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Unsupervised video prediction is a very challenging task due to the complexity and diversity in natural scenes. Prior works directly predicting pixels or optical flows either have the blurring problem or require additional assumptions. We highlight that the crux for video frame prediction lies in precisely capturing the inter-frame variations which encompass the movement of objects and the evolution of the surrounding environment. We then present an unsupervised video prediction framework - Variation Network (VarNet) to directly predict the variations between adjacent frames which are then fused with current frame to generate the future frame. In addition, we propose an adaptively re-weighting mechanism for loss function to offer each pixel a fair weight according to the amplitude of its variation. Extensive experiments for both short-term and long-term video prediction are implemented on two advanced datasets - KTH and KITTI with two evaluating metrics - PSNR and SSIM. For the KTH dataset, the VarNet outperforms the state-of-the-art works up to 11.9% on PSNR and 9.5% on SSIM. As for the KITTI dataset, the performance boosts are up to 55.1% on PSNR and 15.9% on SSIM. Moreover, we verify that the generalization ability of our model excels other state-of-the-art methods by testing on the unseen CalTech Pedestrian dataset after being trained on the KITTI dataset. Source code and video are available at https://github.com/jinbeibei/VarNet.

Tasks

Reproductions