Blind VQA on 360° Video via Progressively Learning from Pixels, Frames and Video

2021-11-18Code Available0· sign in to hype

Li Yang, Mai Xu, Shengxi Li, Yichen Guo, Zulin Wang

Code Available — Be the first to reproduce this paper.

Code

github.com/yanglixiaoshen/provqa
OfficialIn paperpytorch★ 9

Abstract

Blind visual quality assessment (BVQA) on 360 video plays a key role in optimizing immersive multimedia systems. When assessing the quality of 360 video, human tends to perceive its quality degradation from the viewport-based spatial distortion of each spherical frame to motion artifact across adjacent frames, ending with the video-level quality score, i.e., a progressive quality assessment paradigm. However, the existing BVQA approaches for 360 video neglect this paradigm. In this paper, we take into account the progressive paradigm of human perception towards spherical video quality, and thus propose a novel BVQA approach (namely ProVQA) for 360 video via progressively learning from pixels, frames and video. Corresponding to the progressive learning of pixels, frames and video, three sub-nets are designed in our ProVQA approach, i.e., the spherical perception aware quality prediction (SPAQ), motion perception aware quality prediction (MPAQ) and multi-frame temporal non-local (MFTN) sub-nets. The SPAQ sub-net first models the spatial quality degradation based on spherical perception mechanism of human. Then, by exploiting motion cues across adjacent frames, the MPAQ sub-net properly incorporates motion contextual information for quality assessment on 360 video. Finally, the MFTN sub-net aggregates multi-frame quality degradation to yield the final quality score, via exploring long-term quality correlation from multiple frames. The experiments validate that our approach significantly advances the state-of-the-art BVQA performance on 360 video over two datasets, the code of which has been public in https://github.com/yanglixiaoshen/ProVQA.

Tasks

Visual Question Answering (VQA)

Blind VQA on 360° Video via Progressively Learning from Pixels, Frames and Video

Code

Abstract

Tasks

Reproductions