SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 2130 of 111 papers

TitleStatusHype
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
Learning the Best Pooling Strategy for Visual Semantic EmbeddingCode1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
Fine-grained Video-Text Retrieval with Hierarchical Graph ReasoningCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
HANet: Hierarchical Alignment Networks for Video-Text RetrievalCode1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
Show:102550
← PrevPage 3 of 12Next →

No leaderboard results yet.