Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 486 papers

Title	Date	Tasks	Status	Hype
MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed	Jun 11, 2025	RetrievalVideo Retrieval	—Unverified	0
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval	Jun 11, 2025	RetrievalText to Video Retrieval	—Unverified	0
From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos	Jun 5, 2025	Action ClassificationComposed Video Retrieval (CoVR)	CodeCode Available	0
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review	May 29, 2025	RetrievalText to Video Retrieval	—Unverified	0
Learning World Models for Interactive Video Generation	May 28, 2025	In-Context LearningRetrieval	—Unverified	0
A Challenge to Build Neuro-Symbolic Video Agents	May 20, 2025	Scene ClassificationVideo Retrieval	CodeCode Available	0
LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts	May 20, 2025	Caption GenerationRetrieval	CodeCode Available	1
Video-GPT via Next Clip Diffusion	May 18, 2025	DenoisingImage Animation	CodeCode Available	1
Contrastive Alignment with Semantic Gap-Aware Corrections in Text-Video Retrieval	May 18, 2025	Contrastive LearningRetrieval	CodeCode Available	0
CMAWRNet: Multiple Adverse Weather Removal via a Unified Quaternion Neural Architecture	May 3, 2025	Autonomous DrivingBenchmarking	—Unverified	0
Empowering Agentic Video Analytics Systems with Video Language Models	May 1, 2025	Knowledge GraphsRAG	—Unverified	0
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams	Apr 21, 2025	InformativenessLow-latency processing	CodeCode Available	0
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval	Apr 17, 2025	Partially Relevant Video RetrievalRetrieval	—Unverified	0
Towards Efficient Partially Relevant Video Retrieval with Active Moment Discovering	Apr 15, 2025	Partially Relevant Video RetrievalRetrieval	CodeCode Available	0
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking	Apr 11, 2025	Moment RetrievalQuestion Answering	—Unverified	0
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video Retrieval	Apr 7, 2025	Contrastive LearningRetrieval	CodeCode Available	0
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval	Apr 2, 2025	cross-modal alignmentRetrieval	—Unverified	0
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval	Mar 24, 2025	RetrievalText to Video Retrieval	—Unverified	0
Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs)	Mar 21, 2025	Representation LearningRetrieval	CodeCode Available	0
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory	Mar 17, 2025	FormGPU	—Unverified	0
StableFusion: Continual Video Retrieval via Frame Adaptation	Mar 13, 2025	Continual LearningMixture-of-Experts	CodeCode Available	1
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model	Mar 12, 2025	AudioCapsContrastive Learning	—Unverified	0
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions	Mar 7, 2025	RetrievalVideo Retrieval	—Unverified	0
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning	Mar 4, 2025	Contrastive LearningImage-text Retrieval	—Unverified	0
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living	Mar 3, 2025	Action AnticipationDecision Making	—Unverified	0

Show:10 25 50

← PrevPage 1 of 20Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	EERCF	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified