SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 3140 of 111 papers

TitleStatusHype
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge TransferringCode1
MV-Adapter: Multimodal Video Transfer Learning for Video Text RetrievalCode1
Test of Time: Instilling Video-Language Models with a Sense of TimeCode1
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion LearningCode1
VTC: Improving Video-Text Retrieval with User CommentsCode1
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text RetrievalCode1
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalCode1
X-Pool: Cross-Modal Language-Video Attention for Text-Video RetrievalCode1
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Show:102550
← PrevPage 4 of 12Next →

No leaderboard results yet.