SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 3140 of 111 papers

TitleStatusHype
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalCode1
MV-Adapter: Multimodal Video Transfer Learning for Video Text RetrievalCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video RetrievalCode1
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
Cross-Modal Retrieval with Partially Mismatched PairsCode1
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic TasksCode1
HANet: Hierarchical Alignment Networks for Video-Text RetrievalCode1
Show:102550
← PrevPage 4 of 12Next →

No leaderboard results yet.