SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 5175 of 111 papers

TitleStatusHype
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingCode4
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval0
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception0
SViTT: Temporal Learning of Sparse Video-Text TransformersCode1
CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive LearningCode0
Deep Learning for Video-Text Retrieval: a Review0
Cross-Modal Retrieval with Partially Mismatched PairsCode1
Video-Text Retrieval by Supervised Sparse Multi-Grained LearningCode0
UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal ModelingCode1
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval0
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge TransferringCode1
MV-Adapter: Multimodal Video Transfer Learning for Video Text RetrievalCode1
Test of Time: Instilling Video-Language Models with a Sense of TimeCode1
HiVLP: Hierarchical Interactive Video-Language Pre-Training0
Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval0
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval0
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval0
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion LearningCode1
VTC: Improving Video-Text Retrieval with User CommentsCode1
Vision-Language Pre-training: Basics, Recent Advances, and Future TrendsCode3
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval0
Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval0
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval0
Show:102550
← PrevPage 3 of 5Next →

No leaderboard results yet.