SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 4150 of 111 papers

TitleStatusHype
RGNet: A Unified Clip Retrieval and Grounding Network for Long VideosCode1
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion LearningCode1
CLIP2Video: Mastering Video-Text Retrieval via Image CLIPCode1
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language UnderstandingCode1
Learning the Best Pooling Strategy for Visual Semantic EmbeddingCode1
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge TransferringCode1
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and MemoryCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
X-Pool: Cross-Modal Language-Video Attention for Text-Video RetrievalCode1
Show:102550
← PrevPage 5 of 12Next →

No leaderboard results yet.