Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 141–150 of 486 papers

Title	Date	Tasks	Status	Hype
MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding	Jun 7, 2023	RetrievalSentence	—Unverified	0
An Overview of Challenges in Egocentric Text-Video Retrieval	Jun 7, 2023	RetrievalVideo Retrieval	—Unverified	0
fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs	May 31, 2023	Action RecognitionAutonomous Vehicles	—Unverified	0
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset	May 29, 2023	Audio captioningAudio-Visual Captioning	CodeCode Available	2
FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition	May 29, 2023	Action RecognitionAutonomous Vehicles	—Unverified	0
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending	May 22, 2023	Question AnsweringRetrieval	—Unverified	0
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment	May 20, 2023	RetrievalVideo Retrieval	CodeCode Available	1
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval	May 13, 2023	RetrievalText Retrieval	—Unverified	0
A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension	May 5, 2023	Reading ComprehensionRetrieval	CodeCode Available	1
A Review of Deep Learning for Video Captioning	Apr 22, 2023	Deep LearningDense Video Captioning	—Unverified	0

Show:10 25 50

← PrevPage 15 of 49Next →

All datasets MSR-VTT-1kA DiDeMo MSR-VTT LSMDC ActivityNet MSVD YouCook2 FIVR-200K VATEX QuerYD SSv2-label retrieval SSv2-template retrieval

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	OmniVec	text-to-video R@10	89.4	—	Unverified
2	CLIP4Clip	text-to-video R@10	81.6	—	Unverified
3	OmniVec (pretrained)	text-to-video R@10	78.6	—	Unverified
4	HunYuan_tvr (huge)	text-to-video R@1	62.9	—	Unverified
5	CLIP-ViP	text-to-video R@1	57.7	—	Unverified
6	PIDRo	text-to-video R@1	55.9	—	Unverified
7	DMAE (ViT-B/16)	text-to-video R@1	55.5	—	Unverified
8	HunYuan_tvr	text-to-video R@1	55	—	Unverified
9	MuLTI	text-to-video R@1	54.7	—	Unverified
10	EERCF	text-to-video R@1	54.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Aurora (ours, r=64)	text-to-video R@5	77.4	—	Unverified
2	InternVideo2-6B	text-to-video R@1	74.2	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	72.3	—	Unverified
4	VAST	text-to-video R@1	72	—	Unverified
5	COSA	text-to-video R@1	70.5	—	Unverified
6	UMT-L (ViT-L/16)	text-to-video R@1	70.4	—	Unverified
7	GRAM	text-to-video R@1	67.3	—	Unverified
8	VALOR	text-to-video R@1	61.5	—	Unverified
9	TESTA (ViT-B/16)	text-to-video R@1	61.2	—	Unverified
10	VindLU	text-to-video R@1	61.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GRAM	text-to-video R@1	64	—	Unverified
2	VAST	text-to-video R@1	63.9	—	Unverified
3	InternVideo2-6B	text-to-video R@1	62.8	—	Unverified
4	VALOR	text-to-video R@1	59.9	—	Unverified
5	UMT-L (ViT-L/16)	text-to-video R@1	58.8	—	Unverified
6	vid-TLDR (UMT-L)	text-to-video R@1	58.1	—	Unverified
7	COSA	text-to-video R@1	57.9	—	Unverified
8	InternVideo2-6B	text-to-video R@1	55.9	—	Unverified
9	InternVideo	text-to-video R@1	55.2	—	Unverified
10	VLAB	text-to-video R@1	55.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)	text-to-video R@10	53.7	—	Unverified
2	InternVideo2-6B	text-to-video R@1	46.4	—	Unverified
3	vid-TLDR (UMT-L)	text-to-video R@1	43.1	—	Unverified
4	UMT-L (ViT-L/16)	text-to-video R@1	43	—	Unverified
5	HunYuan_tvr (huge)	text-to-video R@1	40.4	—	Unverified
6	COSA	text-to-video R@1	39.4	—	Unverified
7	mPLUG-2	text-to-video R@1	34.4	—	Unverified
8	VALOR	text-to-video R@1	34.2	—	Unverified
9	InternVideo	text-to-video R@1	34	—	Unverified
10	InternVideo2-6B	text-to-video R@1	33.8	—	Unverified