Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–486 of 486 papers

Title	Date	Tasks	Status
Time-Equivariant Contrastive Video Representation Learning	Dec 7, 2021	Action RecognitionContrastive Learning	—Unverified
Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention	Sep 17, 2023	Action RecognitionGraph Generation	—Unverified
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking	Apr 11, 2025	Moment RetrievalQuestion Answering	—Unverified
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset	Jun 19, 2024	Language ModelingLanguage Modelling	—Unverified
TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba	Feb 21, 2025	image-classificationImage Classification	—Unverified
TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval	Sep 21, 2020	Action DetectionActivity Detection	—Unverified
Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval	Jul 6, 2020	RetrievalVideo Retrieval	—Unverified
Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising	Sep 19, 2022	Image RetrievalRetrieval	—Unverified
Two-person interaction detection using body-pose features and multiple instance learning	Jul 16, 2012	Activity RecognitionHuman Activity Recognition	—Unverified
Uncertainty-aware sign language video retrieval with probability distribution modeling	May 30, 2024	RetrievalSign Language Retrieval	—Unverified
Unfolding Videos Dynamics via Taylor Expansion	Sep 4, 2024	Action DetectionAction Recognition	—Unverified
Unified Embedding and Metric Learning for Zero-Exemplar Event Detection	May 5, 2017	Event DetectionMetric Learning	—Unverified
Universal Adversarial Head: Practical Protection against Video Data Leakage	Jun 18, 2021	Deep HashingRetrieval	—Unverified
Unsupervised Data Uncertainty Learning in Visual Retrieval Systems	Feb 7, 2019	RetrievalTriplet	—Unverified
Unsupervised Segmentation of Action Segments in Egocentric Videos using Gaze	Sep 30, 2017	Activity RecognitionRetrieval	—Unverified
Use of Affective Visual Information for Summarization of Human-Centric Videos	Jul 8, 2021	Emotion RecognitionRetrieval	—Unverified
V3C - a Research Video Collection	Oct 11, 2018	ManagementRetrieval	—Unverified
Video 3D Sampling for Self-supervised Representation Learning	Jul 8, 2021	Action RecognitionRepresentation Learning	—Unverified
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding	Sep 28, 2021	Action LocalizationAction Segmentation	—Unverified
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models	Oct 1, 2024	Hallucinationtext similarity	—Unverified
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval	Mar 24, 2025	RetrievalText to Video Retrieval	—Unverified
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding	Sep 29, 2024	DiversityQuestion Answering	—Unverified
Video Editing for Video Retrieval	Feb 4, 2024	RetrievalText Retrieval	—Unverified
Videoprompter: an ensemble of foundational models for zero-shot video understanding	Oct 23, 2023	Action RecognitionDescriptive	—Unverified
Video retrieval based on deep convolutional neural network	Dec 1, 2017	RetrievalTriplet	—Unverified
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners	Dec 9, 2022	Question AnsweringRetrieval	—Unverified
Vi-MIX FOR SELF-SUPERVISED VIDEO REPRESENTATION	Sep 29, 2021	Action RecognitionRepresentation Learning	—Unverified
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation	Oct 11, 2021	Moment RetrievalRetrieval	—Unverified
Visual Information Retrieval in Endoscopic Video Archives	Apr 29, 2015	Information RetrievalRetrieval	—Unverified
Visual Semantic Search: Retrieving Videos via Complex Textual Queries	Jun 1, 2014	Autonomous DrivingNatural Language Queries	—Unverified
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending	May 22, 2023	Question AnsweringRetrieval	—Unverified
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding	May 20, 2021	Action SegmentationLanguage Modeling	—Unverified
VRAG: Region Attention Graphs for Content-Based Video Retrieval	May 18, 2022	RetrievalVideo Retrieval	—Unverified
VRFP: On-the-fly Video Retrieval using Web Images and Fast Fisher Vector Products	Dec 10, 2015	Re-RankingRetrieval	—Unverified
VScript: Controllable Script Generation with Visual Presentation	Mar 1, 2022	Dialogue GenerationRetrieval	—Unverified
Watch Less and Uncover More: Could Navigation Tools Help Users Search and Explore Videos?	Jan 10, 2022	Information RetrievalRetrieval	—Unverified

Show:10 25 50

← PrevPage 10 of 10Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	STAN	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified