SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 8190 of 111 papers

TitleStatusHype
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEsCode2
Egocentric Video-Language PretrainingCode2
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition0
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalCode1
X-Pool: Cross-Modal Language-Video Attention for Text-Video RetrievalCode1
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Video-Text Pre-training with Learned RegionsCode1
Show:102550
← PrevPage 9 of 12Next →

No leaderboard results yet.