Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–425 of 486 papers

Title	Date	Tasks	Status
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners	Dec 9, 2022	Question AnsweringRetrieval	—Unverified
Vi-MIX FOR SELF-SUPERVISED VIDEO REPRESENTATION	Sep 29, 2021	Action RecognitionRepresentation Learning	—Unverified
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation	Oct 11, 2021	Moment RetrievalRetrieval	—Unverified
Visual Information Retrieval in Endoscopic Video Archives	Apr 29, 2015	Information RetrievalRetrieval	—Unverified
Visual Semantic Search: Retrieving Videos via Complex Textual Queries	Jun 1, 2014	Autonomous DrivingNatural Language Queries	—Unverified
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending	May 22, 2023	Question AnsweringRetrieval	—Unverified
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding	May 20, 2021	Action SegmentationLanguage Modeling	—Unverified
VRAG: Region Attention Graphs for Content-Based Video Retrieval	May 18, 2022	RetrievalVideo Retrieval	—Unverified
VRFP: On-the-fly Video Retrieval using Web Images and Fast Fisher Vector Products	Dec 10, 2015	Re-RankingRetrieval	—Unverified
VScript: Controllable Script Generation with Visual Presentation	Mar 1, 2022	Dialogue GenerationRetrieval	—Unverified
Watch Less and Uncover More: Could Navigation Tools Help Users Search and Explore Videos?	Jan 10, 2022	Information RetrievalRetrieval	—Unverified
AMIL: Adversarial Multi Instance Learning for Human Pose Estimation	Mar 18, 2020	Multiple Instance LearningPose Estimation	CodeCode Available
Self-supervised Video Representation Learning by Context and Motion Decoupling	Apr 2, 2021	Action RecognitionCPU	CodeCode Available
LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers	Jun 1, 2018	Copy DetectionRetrieval	CodeCode Available
Joint Searching and Grounding: Multi-Granularity Video Content Retrieval	Oct 23, 2023	Contrastive LearningRetrieval	CodeCode Available
Self-supervised Video Representation Learning with Cascade Positive Retrieval	Jan 20, 2022	Action RecognitionContrastive Learning	CodeCode Available
Dialogue-to-Video Retrieval	Mar 23, 2023	Recommendation SystemsRetrieval	CodeCode Available
Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video	Mar 5, 2020	Action RecognitionRepresentation Learning	CodeCode Available
Is Multimodal Vision Supervision Beneficial to Language?	Feb 10, 2023	Image RetrievalNatural Language Understanding	CodeCode Available
Semantic Role Aware Correlation Transformer for Text to Video Retrieval	Jun 26, 2022	RetrievalText to Video Retrieval	CodeCode Available
A Challenge to Build Neuro-Symbolic Video Agents	May 20, 2025	Scene ClassificationVideo Retrieval	CodeCode Available
Deep Hashing with Category Mask for Fast Video Retrieval	Dec 22, 2017	Code GenerationDeep Hashing	CodeCode Available
Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement	Feb 21, 2024	Moment RetrievalRetrieval	CodeCode Available
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval	Jul 23, 2024	RetrievalSign Language Retrieval	CodeCode Available
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models	Jun 28, 2023	RetrievalVideo Retrieval	CodeCode Available

Show:10 25 50

← PrevPage 17 of 20Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	EERCF	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified