SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 2130 of 111 papers

TitleStatusHype
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic TasksCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
Multi-event Video-Text RetrievalCode1
Learning the Best Pooling Strategy for Visual Semantic EmbeddingCode1
MV-Adapter: Multimodal Video Transfer Learning for Video Text RetrievalCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
HANet: Hierarchical Alignment Networks for Video-Text RetrievalCode1
Show:102550
← PrevPage 3 of 12Next →

No leaderboard results yet.