Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–375 of 486 papers

Title	Date	Tasks	Status	Hype
Object Priors for Classifying and Localizing Unseen Actions	Apr 10, 2021	Action ClassificationAction Localization	CodeCode Available	0
Self-supervised Video Representation Learning by Context and Motion Decoupling	Apr 2, 2021	Action RecognitionCPU	CodeCode Available	0
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning	Apr 1, 2021	Question AnsweringRepresentation Learning	—Unverified	0
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval	Apr 1, 2021	RetrievalText Retrieval	CodeCode Available	1
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning	Mar 30, 2021	counterfactualObject	—Unverified	0
MDMMT: Multidomain Multimodal Transformer for Video Retrieval	Mar 19, 2021	RetrievalText to Video Retrieval	CodeCode Available	1
On Semantic Similarity in Video Retrieval	Mar 18, 2021	RetrievalSemantic Similarity	CodeCode Available	1
Rudder: A Cross Lingual Video and Text Retrieval Dataset	Mar 9, 2021	Natural Language QueriesRetrieval	CodeCode Available	0
A Straightforward Framework For Video Retrieval Using CLIP	Feb 24, 2021	RetrievalVideo Retrieval	CodeCode Available	1
SeqNet: Learning Descriptors for Sequence-based Hierarchical Place Recognition	Feb 23, 2021	Autonomous DrivingImage Retrieval	CodeCode Available	1
Clarification of Video Retrieval Query Results by the Automated Insertion of Supporting Shots	Feb 19, 2021	RetrievalVideo Editing	—Unverified	0
Win-Fail Action Recognition	Feb 15, 2021	Action RecognitionAction Understanding	CodeCode Available	0
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling	Feb 11, 2021	Question AnsweringRetrieval	CodeCode Available	1
Temporal Contrastive Graph Learning for Video Action Recognition and Retrieval	Jan 4, 2021	Action RecognitionContrastive Learning	—Unverified	0
Self-supervised Temporal Learning	Jan 1, 2021	Contrastive LearningRetrieval	—Unverified	0
Grounding Physical Object and Event Concepts Through Dynamic Visual Reasoning	Jan 1, 2021	counterfactualObject	—Unverified	0
SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries	Nov 24, 2020	Ad-hoc video searchManagement	CodeCode Available	0
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus	Nov 18, 2020	Language ModelingLanguage Modelling	—Unverified	0
Graph Based Temporal Aggregation for Video Retrieval	Nov 4, 2020	RetrievalVideo Retrieval	CodeCode Available	0
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning	Nov 1, 2020	Cross-Modal RetrievalRepresentation Learning	CodeCode Available	1
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning	Oct 29, 2020	Contrastive LearningData Augmentation	CodeCode Available	1
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning	Oct 27, 2020	Action RecognitionRepresentation Learning	CodeCode Available	1
Self-supervised Co-training for Video Representation Learning	Oct 19, 2020	Action RecognitionContrastive Learning	CodeCode Available	1
Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning	Oct 17, 2020	RetrievalTransfer Learning	CodeCode Available	1
Support-set bottlenecks for video-text representation learning	Oct 6, 2020	Contrastive LearningRepresentation Learning	—Unverified	0

Show:10 25 50

← PrevPage 15 of 20Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	EERCF	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified