Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 486 papers

Title	Date	Tasks	Status	Score
LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers	Jun 1, 2018	Copy DetectionRetrieval	CodeCode Available	5
Discriminative Residual Analysis for Image Set Classification with Posture and Age Variations	Aug 23, 2020	General ClassificationRetrieval	CodeCode Available	5
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language	Apr 1, 2022	DiversityImage Captioning	CodeCode Available	5
Self-supervised Video Representation Learning by Context and Motion Decoupling	Apr 2, 2021	Action RecognitionCPU	CodeCode Available	5
Central Similarity Quantization for Efficient Image and Video Retrieval	Aug 1, 2019	QuantizationRetrieval	CodeCode Available	5
Joint Searching and Grounding: Multi-Granularity Video Content Retrieval	Oct 23, 2023	Contrastive LearningRetrieval	CodeCode Available	5
Is Multimodal Vision Supervision Beneficial to Language?	Feb 10, 2023	Image RetrievalNatural Language Understanding	CodeCode Available	5
Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval	Sep 15, 2023	RetrievalVideo Classification	CodeCode Available	5
Dialogue-to-Video Retrieval	Mar 23, 2023	Recommendation SystemsRetrieval	CodeCode Available	5
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval	Jul 23, 2024	RetrievalSign Language Retrieval	CodeCode Available	5
Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement	Feb 21, 2024	Moment RetrievalRetrieval	CodeCode Available	5
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models	Jun 28, 2023	RetrievalVideo Retrieval	CodeCode Available	5
SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries	Nov 24, 2020	Ad-hoc video searchManagement	CodeCode Available	5
Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video	Mar 5, 2020	Action RecognitionRepresentation Learning	CodeCode Available	5
Rudder: A Cross Lingual Video and Text Retrieval Dataset	Mar 9, 2021	Natural Language QueriesRetrieval	CodeCode Available	5
Screencast Tutorial Video Understanding	Jun 1, 2020	object-detectionObject Detection	CodeCode Available	5
Deep Hashing with Category Mask for Fast Video Retrieval	Dec 22, 2017	Code GenerationDeep Hashing	CodeCode Available	5
Inter-intra Variant Dual Representations forSelf-supervised Video Recognition	Jul 2, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	5
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams	Apr 21, 2025	InformativenessLow-latency processing	CodeCode Available	5
Hashing with Mutual Information	Mar 2, 2018	Image RetrievalRetrieval	CodeCode Available	5
Contrastive Alignment with Semantic Gap-Aware Corrections in Text-Video Retrieval	May 18, 2025	Contrastive LearningRetrieval	CodeCode Available	5
Graph Based Temporal Aggregation for Video Retrieval	Nov 4, 2020	RetrievalVideo Retrieval	CodeCode Available	5
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning	Jul 20, 2022	Action RecognitionClustering	CodeCode Available	5
A Challenge to Build Neuro-Symbolic Video Agents	May 20, 2025	Scene ClassificationVideo Retrieval	CodeCode Available	5
RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval	Oct 13, 2022	Contrastive LearningRetrieval	CodeCode Available	5

Show:10 25 50

← PrevPage 9 of 20Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	EERCF	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified