SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 4150 of 111 papers

TitleStatusHype
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain RetrievalCode1
Polysemous Visual-Semantic Embedding for Cross-Modal RetrievalCode1
HANet: Hierarchical Alignment Networks for Video-Text RetrievalCode1
Video-Language Alignment via Spatio-Temporal Graph TransformerCode1
Learning the Best Pooling Strategy for Visual Semantic EmbeddingCode1
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge TransferringCode1
Learning Video Context as Interleaved Multimodal SequencesCode1
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and MemoryCode1
X-Pool: Cross-Modal Language-Video Attention for Text-Video RetrievalCode1
Show:102550
← PrevPage 5 of 12Next →

No leaderboard results yet.