Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 486 papers

Title	Date	Tasks	Status
A Proposal-based Approach for Activity Image-to-Video Retrieval	Nov 24, 2019	Cross-Modal RetrievalRetrieval	—Unverified
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval	Jun 21, 2024	RetrievalSentence	—Unverified
EA-VTR: Event-Aware Video-Text Retrieval	Jul 10, 2024	Action RecognitionContrastive Learning	—Unverified
Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval	Sep 20, 2023	RetrievalVideo Retrieval	—Unverified
Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval	Oct 12, 2023	RetrievalSemantic Retrieval	—Unverified
Multi-Granularity Graph Pooling for Video-based Person Re-Identification	Sep 23, 2022	Node ClusteringPerson Re-Identification	—Unverified
Learning Language-Visual Embedding for Movie Understanding with Natural-Language	Sep 26, 2016	Multiple-choiceRetrieval	—Unverified
Learning Joint Representations of Videos and Sentences with Web Image Search	Aug 8, 2016	Image RetrievalNatural Language Queries	—Unverified
CLIP2TV: Align, Match and Distill for Video-Text Retrieval	Nov 10, 2021	Representation LearningRetrieval	—Unverified
Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval	Oct 25, 2021	Domain AdaptationRetrieval	—Unverified
Learning Audio-Video Modalities from Image Captions	Apr 1, 2022	Image CaptioningRetrieval	—Unverified
Classroom Video Assessment and Retrieval via Multiple Instance Learning	Mar 25, 2014	Multiple Instance LearningRetrieval	—Unverified
Learning and Recognizing Human Action from Skeleton Movement with Deep Residual Neural Networks	Mar 21, 2018	Action RecognitionDeep Learning	—Unverified
Learning Locally-Adaptive Decision Functions for Person Verification	Jun 1, 2013	Face VerificationMetric Learning	—Unverified
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus	Nov 18, 2020	Language ModelingLanguage Modelling	—Unverified
MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling	Mar 10, 2023	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	—Unverified
Learning text-to-video retrieval from image captioning	Apr 26, 2024	Image CaptioningImage Retrieval	—Unverified
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living	Mar 3, 2025	Action AnticipationDecision Making	—Unverified
LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval	Jul 11, 2022	Representation LearningRetrieval	—Unverified
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision	Apr 15, 2023	Language ModelingLanguage Modelling	—Unverified
Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval	Dec 1, 2023	Image RetrievalPartially Relevant Video Retrieval	—Unverified
Distilling Vision-Language Models on Millions of Videos	Jan 11, 2024	Language ModelingLanguage Modelling	—Unverified
Clarification of Video Retrieval Query Results by the Automated Insertion of Supporting Shots	Feb 19, 2021	RetrievalVideo Editing	—Unverified
Large Scale Video Representation Learning via Relational Graph Clustering	Jun 1, 2020	ClusteringGraph Clustering	—Unverified
Large-Scale Query-by-Image Video Retrieval Using Bloom Filters	Jul 12, 2016	RetrievalVideo Retrieval	—Unverified

Show:10 25 50

← PrevPage 9 of 20Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	STAN	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified