Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision

2025-05-06Code Available0· sign in to hype

Linhan Cao, Wei Sun, Kaiwei Zhang, Yicong Peng, Guangtao Zhai, Xiongkuo Min

Code Available — Be the first to reproduce this paper.

Code

github.com/clh124/LMM-PVQA
Officialnone★ 3

Abstract

Video quality assessment (VQA) is essential for quantifying perceptual quality in various video processing workflows, spanning from camera capture systems to over-the-top streaming platforms. While recent supervised VQA models have made substantial progress, the reliance on manually annotated datasets -- a process that is labor-intensive, costly, and difficult to scale up -- has hindered further optimization of their generalization to unseen video content and distortions. To bridge this gap, we introduce a self-supervised learning framework for VQA to learn quality assessment capabilities from large-scale, unlabeled web videos. Our approach leverages a learning-to-rank paradigm to train a large multimodal model (LMM) on video pairs automatically labeled via two manners, including quality pseudo-labeling by existing VQA models and relative quality ranking based on synthetic distortion simulations. Furthermore, we introduce a novel iterative self-improvement training strategy, where the trained model acts an improved annotator to iteratively refine the annotation quality of training data. By training on a dataset 10 larger than the existing VQA benchmarks, our model: (1) achieves zero-shot performance on in-domain VQA benchmarks that matches or surpasses supervised models; (2) demonstrates superior out-of-distribution (OOD) generalization across diverse video content and distortions; and (3) sets a new state-of-the-art when fine-tuned on human-labeled datasets. Extensive experimental results validate the effectiveness of our self-supervised approach in training generalized VQA models. The datasets and code will be publicly released to facilitate future research.

Tasks

Learning-To-Rank Self-Supervised Learning Video Quality Assessment Visual Question Answering (VQA)

Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision

Code

Abstract

Tasks

Reproductions