SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 4150 of 111 papers

TitleStatusHype
Video-Text Pre-training with Learned RegionsCode1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
HANet: Hierarchical Alignment Networks for Video-Text RetrievalCode1
CLIP2Video: Mastering Video-Text Retrieval via Image CLIPCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
Learning the Best Pooling Strategy for Visual Semantic EmbeddingCode1
COOT: Cooperative Hierarchical Transformer for Video-Text Representation LearningCode1
Fine-grained Video-Text Retrieval with Hierarchical Graph ReasoningCode1
Polysemous Visual-Semantic Embedding for Cross-Modal RetrievalCode1
Show:102550
← PrevPage 5 of 12Next →

No leaderboard results yet.