SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 5160 of 111 papers

TitleStatusHype
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingCode4
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval0
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception0
SViTT: Temporal Learning of Sparse Video-Text TransformersCode1
CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive LearningCode0
Deep Learning for Video-Text Retrieval: a Review0
Cross-Modal Retrieval with Partially Mismatched PairsCode1
Video-Text Retrieval by Supervised Sparse Multi-Grained LearningCode0
Show:102550
← PrevPage 6 of 12Next →

No leaderboard results yet.