SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 101111 of 111 papers

TitleStatusHype
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval0
Rudder: A Cross Lingual Video and Text Retrieval DatasetCode0
Learning the Best Pooling Strategy for Visual Semantic EmbeddingCode1
COOT: Cooperative Hierarchical Transformer for Video-Text Representation LearningCode1
Exploiting Visual Semantic Reasoning for Video-Text Retrieval0
Retrieving and Highlighting Action with Spatiotemporal Reference0
Stacked Convolutional Deep Encoding Network for Video-Text Retrieval0
Fine-grained Video-Text Retrieval with Hierarchical Graph ReasoningCode1
Polysemous Visual-Semantic Embedding for Cross-Modal RetrievalCode1
Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals0
Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text RetrievalCode0
Show:102550
← PrevPage 3 of 3Next →

No leaderboard results yet.