SOTAVerified

Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision

2025-05-06Code Available0· sign in to hype

Linhan Cao, Wei Sun, Kaiwei Zhang, Yicong Peng, Guangtao Zhai, Xiongkuo Min

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Video quality assessment (VQA) is essential for quantifying perceptual quality in various video processing workflows, spanning from camera capture systems to over-the-top streaming platforms. While recent supervised VQA models have made substantial progress, the reliance on manually annotated datasets -- a process that is labor-intensive, costly, and difficult to scale up -- has hindered further optimization of their generalization to unseen video content and distortions. To bridge this gap, we introduce a self-supervised learning framework for VQA to learn quality assessment capabilities from large-scale, unlabeled web videos. Our approach leverages a learning-to-rank paradigm to train a large multimodal model (LMM) on video pairs automatically labeled via two manners, including quality pseudo-labeling by existing VQA models and relative quality ranking based on synthetic distortion simulations. Furthermore, we introduce a novel iterative self-improvement training strategy, where the trained model acts an improved annotator to iteratively refine the annotation quality of training data. By training on a dataset 10 larger than the existing VQA benchmarks, our model: (1) achieves zero-shot performance on in-domain VQA benchmarks that matches or surpasses supervised models; (2) demonstrates superior out-of-distribution (OOD) generalization across diverse video content and distortions; and (3) sets a new state-of-the-art when fine-tuned on human-labeled datasets. Extensive experimental results validate the effectiveness of our self-supervised approach in training generalized VQA models. The datasets and code will be publicly released to facilitate future research.

Tasks

Reproductions