Unleash the Potential of CLIP for Video Highlight Detection

2024-04-02Code Available0· sign in to hype

Donghoon Han, Seunghyeon Seo, Eunhwan Park, Seong-Uk Nam, Nojun Kwak

Code Available — Be the first to reproduce this paper.

Code

github.com/dhk1349/HL-CLIP
pytorch★ 6

Abstract

Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-trained knowledge embedded in multimodal models. By simply fine-tuning the multimodal encoder in combination with our innovative saliency pooling technique, we have achieved the state-of-the-art performance in the highlight detection task, the QVHighlight Benchmark, to the best of our knowledge.

Tasks

Highlight Detection

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
QVHighlights	HL-CLIP	mAP	41.94	—	Unverified

Unleash the Potential of CLIP for Video Highlight Detection

Code

Abstract

Tasks

Benchmark Results

Reproductions