Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 426–450 of 486 papers

Title	Date	Tasks	Status
Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA	Aug 10, 2019	audio-visual learningRetrieval	—Unverified
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings	Aug 9, 2019	Cross-Modal RetrievalPOS	—Unverified
Central Similarity Quantization for Efficient Image and Video Retrieval	Aug 1, 2019	QuantizationRetrieval	CodeCode Available
SketchGAN: Joint Sketch Completion and Recognition With Generative Adversarial Network	Jun 1, 2019	DecoderGenerative Adversarial Network	—Unverified
Spatio-temporal Video Re-localization by Warp LSTM	May 10, 2019	RetrievalVideo Retrieval	—Unverified
Efficient video indexing for monitoring disease activity and progression in the upper gastrointestinal tract	May 10, 2019	Image RetrievalRetrieval	—Unverified
Interactive Video Retrieval with Dialog	May 7, 2019	RetrievalVideo Retrieval	—Unverified
Unsupervised Data Uncertainty Learning in Visual Retrieval Systems	Feb 7, 2019	RetrievalTriplet	—Unverified
V3C - a Research Video Collection	Oct 11, 2018	ManagementRetrieval	—Unverified
Dual Encoding for Zero-Example Video Retrieval	Sep 17, 2018	Ad-hoc video searchRetrieval	CodeCode Available
FIVR: Fine-grained Incident Video Retrieval	Sep 11, 2018	BenchmarkingRetrieval	CodeCode Available
Find and Focus: Retrieve and Localize Video Events with Natural Language Queries	Sep 1, 2018	DiversityNatural Language Queries	—Unverified
Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos	Sep 1, 2018	RetrievalVideo Retrieval	—Unverified
Video Logo Retrieval based on local Features	Aug 11, 2018	Image RetrievalRetrieval	CodeCode Available
A Joint Sequence Fusion Model for Video Question Answering and Retrieval	Aug 7, 2018	DecoderMultiple-choice	CodeCode Available
Person Search in Videos with One Portrait Through Visual and Temporal Links	Jul 27, 2018	Person Re-IdentificationPerson Search	CodeCode Available
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation	Jul 20, 2018	Face GenerationLip Reading	CodeCode Available
Human Action Recognition and Prediction: A Survey	Jun 28, 2018	Action RecognitionAutonomous Driving	—Unverified
Semantic Image Retrieval by Uniting Deep Neural Networks and Cognitive Architectures	Jun 14, 2018	Deep LearningImage Retrieval	—Unverified
Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval	Jun 11, 2018	Image-text RetrievalRetrieval	CodeCode Available
LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers	Jun 1, 2018	Copy DetectionRetrieval	CodeCode Available
ECO: Efficient Convolutional Network for Online Video Understanding	Apr 24, 2018	Action ClassificationAction Recognition	CodeCode Available
Learning and Recognizing Human Action from Skeleton Movement with Deep Residual Neural Networks	Mar 21, 2018	Action RecognitionDeep Learning	—Unverified
Hashing with Mutual Information	Mar 2, 2018	Image RetrievalRetrieval	CodeCode Available
Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder	Feb 7, 2018	BinarizationDecoder	—Unverified

Show:10 25 50

← PrevPage 18 of 20Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	STAN	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified