RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

2022-03-27CVPR 2022Code Available1· sign in to hype

Zhicheng Geng, Luming Liang, Tianyu Ding, Ilya Zharkov

Code Available — Be the first to reproduce this paper.

Code

github.com/llmpass/RSTT
OfficialIn paperpytorch★ 143

Abstract

Space-time video super-resolution (STVSR) is the task of interpolating videos with both Low Frame Rate (LFR) and Low Resolution (LR) to produce High-Frame-Rate (HFR) and also High-Resolution (HR) counterparts. The existing methods based on Convolutional Neural Network~(CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We propose to resolve this issue by using a spatial-temporal transformer that naturally incorporates the spatial and temporal super resolution modules into a single model. Unlike CNN-based methods, we do not explicitly use separated building blocks for temporal interpolations and spatial super-resolutions; instead, we only use a single end-to-end transformer architecture. Specifically, a reusable dictionary is built by encoders based on the input LFR and LR frames, which is then utilized in the decoder part to synthesize the HFR and HR frames. Compared with the state-of-the-art TMNet xu2021temporal, our network is 60\% smaller (4.5M vs 12.3M parameters) and 80\% faster (26.2fps vs 14.3fps on 720576 frames) without sacrificing much performance. The source code is available at https://github.com/llmpass/RSTT.

Tasks

Decoder Space-time Video Super-resolution Super-Resolution Video Frame Interpolation Video Super-Resolution

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Vimeo90K-Fast	RSTT-L	PSNR	36.8	—	Unverified
Vimeo90K-Fast	RSTT-M	PSNR	36.78	—	Unverified
Vimeo90K-Fast	RSTT-S	PSNR	36.58	—	Unverified
Vimeo90K-Medium	RSTT-L	PSNR	35.66	—	Unverified
Vimeo90K-Medium	RSTT-M	PSNR	35.62	—	Unverified
Vimeo90K-Medium	RSTT-S	PSNR	35.43	—	Unverified

RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

Code

Abstract

Tasks

Benchmark Results

Reproductions