SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 76100 of 111 papers

TitleStatusHype
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation AlignmentCode2
Boosting Video-Text Retrieval with Explicit High-Level Semantics0
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text RetrievalCode1
LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval0
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEsCode2
Egocentric Video-Language PretrainingCode2
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition0
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalCode1
X-Pool: Cross-Modal Language-Video Attention for Text-Video RetrievalCode1
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Video-Text Pre-training with Learned RegionsCode1
CLIP2TV: Align, Match and Distill for Video-Text Retrieval0
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation0
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations0
Learning Context-Adapted Video-Text Retrieval by Attending to User Comments0
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
HANet: Hierarchical Alignment Networks for Video-Text RetrievalCode1
CLIP2Video: Mastering Video-Text Retrieval via Image CLIPCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval0
Show:102550
← PrevPage 4 of 5Next →

No leaderboard results yet.