SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 2130 of 111 papers

TitleStatusHype
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal RetrievalCode1
Unified Coarse-to-Fine Alignment for Video-Text RetrievalCode1
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and MemoryCode1
Multi-event Video-Text RetrievalCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
SViTT: Temporal Learning of Sparse Video-Text TransformersCode1
Cross-Modal Retrieval with Partially Mismatched PairsCode1
UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal ModelingCode1
Show:102550
← PrevPage 3 of 12Next →

No leaderboard results yet.