Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–450 of 486 papers

Title	Date	Tasks	Status
Self-supervised Temporal Learning	Jan 1, 2021	Contrastive LearningRetrieval	—Unverified
SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries	Nov 24, 2020	Ad-hoc video searchManagement	CodeCode Available
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus	Nov 18, 2020	Language ModelingLanguage Modelling	—Unverified
Graph Based Temporal Aggregation for Video Retrieval	Nov 4, 2020	RetrievalVideo Retrieval	CodeCode Available
Support-set bottlenecks for video-text representation learning	Oct 6, 2020	Contrastive LearningRepresentation Learning	—Unverified
Encode the Unseen: Predictive Video Hashing for Scalable Mid-Stream Retrieval	Sep 30, 2020	RetrievalVideo Retrieval	—Unverified
TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval	Sep 21, 2020	Action DetectionActivity Detection	—Unverified
Discriminative Residual Analysis for Image Set Classification with Posture and Age Variations	Aug 23, 2020	General ClassificationRetrieval	CodeCode Available
Exploring Relations in Untrimmed Videos for Self-Supervised Learning	Aug 6, 2020	Action RecognitionChange Detection	—Unverified
The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval	Aug 6, 2020	RetrievalText Retrieval	—Unverified
Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval	Jul 6, 2020	RetrievalVideo Retrieval	—Unverified
Exploiting Visual Semantic Reasoning for Video-Text Retrieval	Jun 16, 2020	RetrievalText Retrieval	—Unverified
Large Scale Video Representation Learning via Relational Graph Clustering	Jun 1, 2020	ClusteringGraph Clustering	—Unverified
Screencast Tutorial Video Understanding	Jun 1, 2020	object-detectionObject Detection	CodeCode Available
Near-duplicate video detection featuring coupled temporal and perceptual visual structures and logical inference based matching	May 15, 2020	RetrievalVideo Editing	—Unverified
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence	Apr 16, 2020	RetrievalSentence	—Unverified
AMIL: Adversarial Multi Instance Learning for Human Pose Estimation	Mar 18, 2020	Multiple Instance LearningPose Estimation	CodeCode Available
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning	Mar 6, 2020	Density EstimationNoise Estimation	CodeCode Available
Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video	Mar 5, 2020	Action RecognitionRepresentation Learning	CodeCode Available
Fine-Grained Instance-Level Sketch-Based Video Retrieval	Feb 21, 2020	Cross-Modal RetrievalImage Retrieval	—Unverified
A Proposal-based Approach for Activity Image-to-Video Retrieval	Nov 24, 2019	Cross-Modal RetrievalRetrieval	—Unverified
Deep Heterogeneous Hashing for Face Video Retrieval	Nov 4, 2019	RetrievalVideo Retrieval	—Unverified
SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval	Oct 1, 2019	DiversityRetrieval	—Unverified
Neighborhood Preserving Hashing for Scalable Video Retrieval	Oct 1, 2019	RetrievalVideo Retrieval	—Unverified
Query by Semantic Sketch	Sep 27, 2019	RetrievalVideo Retrieval	—Unverified
Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA	Aug 10, 2019	audio-visual learningRetrieval	—Unverified
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings	Aug 9, 2019	Cross-Modal RetrievalPOS	—Unverified
Central Similarity Quantization for Efficient Image and Video Retrieval	Aug 1, 2019	QuantizationRetrieval	CodeCode Available
SketchGAN: Joint Sketch Completion and Recognition With Generative Adversarial Network	Jun 1, 2019	DecoderGenerative Adversarial Network	—Unverified
Spatio-temporal Video Re-localization by Warp LSTM	May 10, 2019	RetrievalVideo Retrieval	—Unverified
Efficient video indexing for monitoring disease activity and progression in the upper gastrointestinal tract	May 10, 2019	Image RetrievalRetrieval	—Unverified
Interactive Video Retrieval with Dialog	May 7, 2019	RetrievalVideo Retrieval	—Unverified
Unsupervised Data Uncertainty Learning in Visual Retrieval Systems	Feb 7, 2019	RetrievalTriplet	—Unverified
V3C - a Research Video Collection	Oct 11, 2018	ManagementRetrieval	—Unverified
Dual Encoding for Zero-Example Video Retrieval	Sep 17, 2018	Ad-hoc video searchRetrieval	CodeCode Available
FIVR: Fine-grained Incident Video Retrieval	Sep 11, 2018	BenchmarkingRetrieval	CodeCode Available
Find and Focus: Retrieve and Localize Video Events with Natural Language Queries	Sep 1, 2018	DiversityNatural Language Queries	—Unverified
Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos	Sep 1, 2018	RetrievalVideo Retrieval	—Unverified
Video Logo Retrieval based on local Features	Aug 11, 2018	Image RetrievalRetrieval	CodeCode Available
A Joint Sequence Fusion Model for Video Question Answering and Retrieval	Aug 7, 2018	DecoderMultiple-choice	CodeCode Available
Person Search in Videos with One Portrait Through Visual and Temporal Links	Jul 27, 2018	Person Re-IdentificationPerson Search	CodeCode Available
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation	Jul 20, 2018	Face GenerationLip Reading	CodeCode Available
Human Action Recognition and Prediction: A Survey	Jun 28, 2018	Action RecognitionAutonomous Driving	—Unverified
Semantic Image Retrieval by Uniting Deep Neural Networks and Cognitive Architectures	Jun 14, 2018	Deep LearningImage Retrieval	—Unverified
Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval	Jun 11, 2018	Image-text RetrievalRetrieval	CodeCode Available
LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers	Jun 1, 2018	Copy DetectionRetrieval	CodeCode Available
ECO: Efficient Convolutional Network for Online Video Understanding	Apr 24, 2018	Action ClassificationAction Recognition	CodeCode Available
Learning and Recognizing Human Action from Skeleton Movement with Deep Residual Neural Networks	Mar 21, 2018	Action RecognitionDeep Learning	—Unverified
Hashing with Mutual Information	Mar 2, 2018	Image RetrievalRetrieval	CodeCode Available
Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder	Feb 7, 2018	BinarizationDecoder	—Unverified

Show:10 25 50

← PrevPage 9 of 10Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	EERCF	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified