SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 1120 of 111 papers

TitleStatusHype
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEsCode2
COOT: Cooperative Hierarchical Transformer for Video-Text Representation LearningCode1
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
HANet: Hierarchical Alignment Networks for Video-Text RetrievalCode1
CLIP2Video: Mastering Video-Text Retrieval via Image CLIPCode1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
Fine-grained Video-Text Retrieval with Hierarchical Graph ReasoningCode1
Show:102550
← PrevPage 2 of 12Next →

No leaderboard results yet.