Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–325 of 486 papers

Title	Date	Tasks	Status	Hype
TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning	Dec 7, 2021	Action RecognitionContrastive Learning	CodeCode Available	1
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval	Dec 3, 2021	Ad-hoc video searchfeature selection	CodeCode Available	1
Generalizable Multi-linear Attention Network	Dec 1, 2021	Multimodal Sentiment AnalysisRetrieval	—Unverified	0
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant	Nov 30, 2021	Question AnsweringRetrieval	CodeCode Available	1
Video Content Classification using Deep Learning	Nov 27, 2021	ClassificationDeep Learning	CodeCode Available	1
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling	Nov 24, 2021	Question AnsweringRetrieval	CodeCode Available	1
Florence: A New Foundation Model for Computer Vision	Nov 22, 2021	Action ClassificationAction Recognition	CodeCode Available	1
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions	Nov 19, 2021	RetrievalSuper-Resolution	CodeCode Available	1
Induce, Edit, Retrieve:Language Grounded Multimodal Schema for Instructional Video Retrieval	Nov 17, 2021	RetrievalVideo Retrieval	—Unverified	0
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation	Nov 16, 2021	RetrievalVideo Captioning	—Unverified	0
CLIP2TV: Align, Match and Distill for Video-Text Retrieval	Nov 10, 2021	Representation LearningRetrieval	—Unverified	0
SwAMP: Swapped Assignment of Multi-Modal Pairs for Cross-Modal Retrieval	Nov 10, 2021	Contrastive LearningCross-Modal Retrieval	—Unverified	0
Masking Modalities for Cross-modal Video Retrieval	Nov 1, 2021	RetrievalVideo Retrieval	—Unverified	0
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval	Oct 29, 2021	Cross-Modal RetrievalRelation	CodeCode Available	1
Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval	Oct 25, 2021	Domain AdaptationRetrieval	—Unverified	0
Video and Text Matching with Conditioned Embeddings	Oct 21, 2021	Machine TranslationSentence	CodeCode Available	1
Coarse to Fine: Video Retrieval before Moment Localization	Oct 14, 2021	Moment RetrievalRetrieval	—Unverified	0
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation	Oct 11, 2021	Moment RetrievalRetrieval	—Unverified	0
Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction	Oct 3, 2021	Action RecognitionRepresentation Learning	—Unverified	0
Vi-MIX FOR SELF-SUPERVISED VIDEO REPRESENTATION	Sep 29, 2021	Action RecognitionRepresentation Learning	—Unverified	0
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding	Sep 28, 2021	Action LocalizationAction Segmentation	—Unverified	0
Self-Supervised Video Representation Learning by Video Incoherence Detection	Sep 26, 2021	Action RecognitionContrastive Learning	—Unverified	0
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval	Sep 21, 2021	Corpus Video Moment RetrievalMoment Retrieval	CodeCode Available	1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss	Sep 9, 2021	Mixture-of-ExpertsRetrieval	CodeCode Available	1
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment	Aug 23, 2021	Action SegmentationContrastive Learning	—Unverified	0

Show:10 25 50

← PrevPage 13 of 20Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	EERCF	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified