SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 5160 of 111 papers

TitleStatusHype
Towards Understanding Camera Motions in Any Video0
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders0
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
Expertized Caption Auto-Enhancement for Video-Text RetrievalCode0
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text RetrievalCode0
CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectivesCode0
Show:102550
← PrevPage 6 of 12Next →

No leaderboard results yet.