SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 101111 of 111 papers

TitleStatusHype
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text RetrievalCode0
Video-Text Retrieval by Supervised Sparse Multi-Grained LearningCode0
Harvest Video Foundation Models via Efficient Post-PretrainingCode0
Diving Deep into the Motion Representation of Video-Text ModelsCode0
CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive LearningCode0
Expertized Caption Auto-Enhancement for Video-Text RetrievalCode0
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible AdapterCode0
Rudder: A Cross Lingual Video and Text Retrieval DatasetCode0
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text RetrievalCode0
CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectivesCode0
Show:102550
← PrevPage 3 of 3Next →

No leaderboard results yet.