SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 110 of 111 papers

TitleStatusHype
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
Towards Understanding Camera Motions in Any Video0
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders0
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal UnderstandingCode3
Expertized Caption Auto-Enhancement for Video-Text RetrievalCode0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
Show:102550
← PrevPage 1 of 12Next →

No leaderboard results yet.