Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–275 of 486 papers

Title	Date	Tasks	Status	Score
Dual Encoding for Zero-Example Video Retrieval	Sep 17, 2018	Ad-hoc video searchRetrieval	CodeCode Available	5
Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement	Feb 21, 2024	Moment RetrievalRetrieval	CodeCode Available	5
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter	May 29, 2024	Natural Language Queriesparameter-efficient fine-tuning	—Unverified	0
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling	Jun 25, 2024	Cross-Modal RetrievalNatural Language Queries	—Unverified	0
Action in Mind: A Neural Network Approach to Action Recognition and Segmentation	Apr 30, 2021	Action RecognitionAction Segmentation	—Unverified	0
Advances in Human Action Recognition: A Survey	Jan 23, 2015	Action RecognitionRetrieval	—Unverified	0
A Faster Method for Tracking and Scoring Videos Corresponding to Sentences	Nov 14, 2014	RetrievalSentence	—Unverified	0
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus	Nov 18, 2020	Language ModelingLanguage Modelling	—Unverified	0
Analysis of Gait Pattern to Recognize the Human Activities	Jul 18, 2014	Activity RecognitionHuman Activity Recognition	—Unverified	0
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks	Oct 7, 2023	Action RecognitionMultiple-choice	—Unverified	0
An Empirical Study of Frame Selection for Text-to-Video Retrieval	Nov 1, 2023	RetrievalText to Video Retrieval	—Unverified	0
An Improved Video Analysis using Context based Extension of LSH	May 10, 2017	Action RecognitionRetrieval	—Unverified	0
An Overview of Challenges in Egocentric Text-Video Retrieval	Jun 7, 2023	RetrievalVideo Retrieval	—Unverified	0
A Proposal-based Approach for Activity Image-to-Video Retrieval	Nov 24, 2019	Cross-Modal RetrievalRetrieval	—Unverified	0
A Review of Deep Learning for Video Captioning	Apr 22, 2023	Deep LearningDense Video Captioning	—Unverified	0
ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency	Jun 4, 2021	Action RecognitionRepresentation Learning	—Unverified	0
A Survey of Video-based Action Quality Assessment	Apr 20, 2022	Action Quality AssessmentAction Recognition	—Unverified	0
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment	Jul 24, 2023	RetrievalText to Video Retrieval	—Unverified	0
Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA	Aug 10, 2019	audio-visual learningRetrieval	—Unverified	0
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset	Nov 19, 2022	Common Sense ReasoningGraph Embedding	—Unverified	0
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval	Nov 30, 2023	BenchmarkingRetrieval	—Unverified	0
Bag of Genres for Video Retrieval	May 30, 2015	RetrievalVideo Retrieval	—Unverified	0
Binary Subspace Coding for Query-by-Image Video Retrieval	Dec 6, 2016	RetrievalVideo Retrieval	—Unverified	0
Boosting Video Captioning with Dynamic Loss Network	Jul 25, 2021	image-classificationImage Classification	—Unverified	0
CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing	Oct 29, 2023	Contrastive LearningRetrieval	—Unverified	0

Show:10 25 50

← PrevPage 11 of 20Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	EERCF	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified