SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 91100 of 111 papers

TitleStatusHype
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval0
Towards Understanding Camera Motions in Any Video0
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval0
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval0
Uncertainty-aware sign language video retrieval with probability distribution modeling0
Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval0
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
Videoprompter: an ensemble of foundational models for zero-shot video understanding0
Show:102550
← PrevPage 10 of 12Next →

No leaderboard results yet.