SOTAVerified

Neighbor Relations Matter in Video Scene Detection

2024-01-01CVPR 2024Code Available1· sign in to hype

Jiawei Tan, Hongxing Wang, Jiaxin Li, Zhilong Ou, Zhangbin Qian

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Video scene detection aims to temporally link shots for obtaining semantically compact scenes. It is essential for this task to capture scene-distinguishable affinity among shots by similarity assessment. However most methods relies on ordinary shot-to-shot similarities which may inveigle similar shots into being linked even though they are from different scenes and meanwhile hinder dissimilar shots from being blended into a complete scene. In this paper we propose NeighborNet to inject shot contexts into shot-to-shot similarities through carefully exploring the relations between semantic/temporal neighbors of shots over a local time period. In this way shot-to-shot similarities are remeasured as semantic/temporal neighbor-aware similarities so that NeighborNet can learn context embedding into shot features using graph convolutional network. As a result not only do the learned shot features suppress the affinity among similar shots from different scenes but they also promote the affinity among dissimilar shots in the same scene. Experimental results on public benchmark datasets show that our proposed NeighborNet yields substantial improvements in video scene detection especially outperforms released state-of-the-arts by at least 6% in Average Precision (AP). The code is available at https://github.com/ExMorgan-Alter/NeighborNet.

Tasks

Reproductions