SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 4150 of 111 papers

TitleStatusHype
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic AlignmentCode4
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal RetrievalCode1
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval0
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval0
Unified Coarse-to-Fine Alignment for Video-Text RetrievalCode1
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and MemoryCode1
Multi-event Video-Text RetrievalCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible AdapterCode0
Show:102550
← PrevPage 5 of 12Next →

No leaderboard results yet.