SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 101111 of 111 papers

TitleStatusHype
Videoprompter: an ensemble of foundational models for zero-shot video understanding0
CLIP2TV: Align, Match and Distill for Video-Text Retrieval0
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception0
CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning0
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval0
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation0
Boosting Video-Text Retrieval with Explicit High-Level Semantics0
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval0
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval0
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval0
Show:102550
← PrevPage 5 of 5Next →

No leaderboard results yet.