Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–350 of 486 papers

Title	Date	Tasks	Status
MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling	Mar 10, 2023	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	—Unverified
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval	Jun 21, 2024	RetrievalSentence	—Unverified
Multi-Granularity Graph Pooling for Video-based Person Re-Identification	Sep 23, 2022	Node ClusteringPerson Re-Identification	—Unverified
Multimodal Approach for Video Surveillance Indexing and Retrieval	Aug 6, 2013	RetrievalVideo Retrieval	—Unverified
Multimodal Contextualized Support for Enhancing Video Retrieval System	Dec 10, 2024	object-detectionObject Detection	—Unverified
Multimodal Skip-gram Using Convolutional Pseudowords	Nov 12, 2015	Object RecognitionRetrieval	—Unverified
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence	Apr 16, 2020	RetrievalSentence	—Unverified
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval	Oct 15, 2024	DescriptiveRetrieval	—Unverified
MultiVENT: Multilingual Videos of Events with Aligned Natural Text	Jul 6, 2023	Information RetrievalRetrieval	—Unverified
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions	Mar 7, 2025	RetrievalVideo Retrieval	—Unverified
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality	Aug 18, 2024	RetrievalText Retrieval	—Unverified
Near-duplicate video detection featuring coupled temporal and perceptual visual structures and logical inference based matching	May 15, 2020	RetrievalVideo Editing	—Unverified
Neighborhood Preserving Hashing for Scalable Video Retrieval	Oct 1, 2019	RetrievalVideo Retrieval	—Unverified
Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce	Aug 1, 2024	Graph MatchingRetrieval	—Unverified
NEWSKVQA: Knowledge-Aware News Video Question Answering	Feb 8, 2022	Common Sense ReasoningManagement	—Unverified
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision	Dec 20, 2023	Action ClassificationAttribute	—Unverified
Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval	Jul 22, 2024	AllRetrieval	—Unverified
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks	Sep 15, 2022	Action ClassificationAction Recognition	—Unverified
Perfect Match in Video Retrieval	Mar 29, 2023	RetrievalVideo Retrieval	—Unverified
PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval	Jan 1, 2023	Representation LearningRetrieval	—Unverified
PolySmart @ TRECVid 2024 Medical Video Question Answering	Dec 20, 2024	Question AnsweringRetrieval	—Unverified
Pose-Aided Video-based Person Re-Identification via Recurrent Graph Convolutional Network	Sep 23, 2022	Person Re-IdentificationRetrieval	—Unverified
Probabilistic Representations for Video Contrastive Learning	Apr 8, 2022	Action RecognitionContrastive Learning	—Unverified
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval	Apr 18, 2024	DiversityRetrieval	—Unverified
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval	Apr 17, 2025	Partially Relevant Video RetrievalRetrieval	—Unverified
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval	Jun 11, 2025	RetrievalText to Video Retrieval	—Unverified
QSAM-Net: Rain streak removal by quaternion neural network with self-attention module	Aug 8, 2022	Benchmarkingobject-detection	—Unverified
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model	Mar 12, 2025	AudioCapsContrastive Learning	—Unverified
Query by Semantic Sketch	Sep 27, 2019	RetrievalVideo Retrieval	—Unverified
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning	Dec 18, 2024	Moment RetrievalMulti-Task Learning	—Unverified
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter	May 29, 2024	Natural Language Queriesparameter-efficient fine-tuning	—Unverified
Real-time analysis of cataract surgery videos using statistical models	Oct 18, 2016	RetrievalVideo Retrieval	—Unverified
Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding	Nov 28, 2022	Ad-hoc video searchNegation	—Unverified
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model	Jun 2, 2024	Action RecognitionTemporal Action Localization	—Unverified
Self-supervised Spatiotemporal Representation Learning by Exploiting Video Continuity	Dec 11, 2021	Action LocalizationAction Recognition	—Unverified
Self-supervised Temporal Learning	Jan 1, 2021	Contrastive LearningRetrieval	—Unverified
Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder	Feb 7, 2018	BinarizationDecoder	—Unverified
Self-Supervised Video Representation Learning with Meta-Contrastive Network	Aug 19, 2021	Action RecognitionContrastive Learning	—Unverified
Self-Supervised Video Representation Learning by Video Incoherence Detection	Sep 26, 2021	Action RecognitionContrastive Learning	—Unverified
Self-supervised Video Retrieval Transformer Network	Apr 16, 2021	RetrievalSelf-supervised Video Retrieval	—Unverified
Semantic Image Retrieval by Uniting Deep Neural Networks and Cognitive Architectures	Jun 14, 2018	Deep LearningImage Retrieval	—Unverified
Semantic Video Entity Linking Based on Visual Content and Metadata	Dec 1, 2015	Entity LinkingMetric Learning	—Unverified
Semantic Video Moments Retrieval at Scale: A New Task and a Baseline	Oct 15, 2022	RetrievalVideo Retrieval	—Unverified
Semi-automatic Data Annotation System for Multi-Target Multi-Camera Vehicle Tracking	Sep 20, 2022	RetrievalVideo Retrieval	—Unverified
Sharing Hash Codes for Multiple Purposes	Sep 11, 2016	RetrievalVideo Retrieval	—Unverified
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval	Apr 22, 2024	RetrievalVideo Retrieval	—Unverified
Sign Language Video Retrieval with Free-Form Textual Queries	Jan 7, 2022	FormRetrieval	—Unverified
Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval	Nov 14, 2023	RetrievalVideo Retrieval	—Unverified
SketchGAN: Joint Sketch Completion and Recognition With Generative Adversarial Network	Jun 1, 2019	DecoderGenerative Adversarial Network	—Unverified
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training	Nov 21, 2022	cross-modal alignmentGPU	—Unverified

Show:10 25 50

← PrevPage 7 of 10Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	EERCF	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified