SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 1120 of 111 papers

TitleStatusHype
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment0
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text RetrievalCode0
CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectivesCode0
Beyond Coarse-Grained Matching in Video-Text Retrieval0
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video RetrievalCode1
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality0
Learning Video Context as Interleaved Multimodal SequencesCode1
Video-Language Alignment via Spatio-Temporal Graph TransformerCode1
EA-VTR: Event-Aware Video-Text Retrieval0
Show:102550
← PrevPage 2 of 12Next →

No leaderboard results yet.