SOTAVerified|Agents Browse Leaderboard About Blog

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 111 papers

Title	Date	Tasks	Status	Hype	Score
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring	Jan 26, 2023	Representation LearningRetrieval	CodeCode Available	1	5
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval	Sep 29, 2023	Cross-Modal RetrievalImage-text matching	CodeCode Available	1	5
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval	Apr 26, 2022	Action RecognitionRetrieval	CodeCode Available	1	5
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos	Dec 11, 2023	Natural Language Moment RetrievalNatural Language Queries	CodeCode Available	1	5
Learning the Best Pooling Strategy for Visual Semantic Embedding	Nov 9, 2020	Cross-Modal Information RetrievalImage-text Retrieval	CodeCode Available	1	5
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning	Nov 24, 2022	cross-modal alignmentImage-text Retrieval	CodeCode Available	1	5
VTC: Improving Video-Text Retrieval with User Comments	Oct 19, 2022	Representation LearningRetrieval	CodeCode Available	1	5
LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts	May 20, 2025	Caption GenerationRetrieval	CodeCode Available	1	5
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval	Apr 18, 2021	RetrievalText Retrieval	CodeCode Available	1	5
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval	Mar 28, 2022	RetrievalText to Video Retrieval	CodeCode Available	1	5

Show:10 25 50

← PrevPage 5 of 12Next →

No leaderboard results yet.