SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 1120 of 111 papers

TitleStatusHype
Egocentric Video-Language PretrainingCode2
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video RetrievalCode1
Learning Video Context as Interleaved Multimodal SequencesCode1
Video-Language Alignment via Spatio-Temporal Graph TransformerCode1
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic TasksCode1
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain RetrievalCode1
RGNet: A Unified Clip Retrieval and Grounding Network for Long VideosCode1
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language UnderstandingCode1
Show:102550
← PrevPage 2 of 12Next →

No leaderboard results yet.