SOTAVerified

Text to Video Retrieval

She's gone I can't find her anywhere I'm looking everywhere for her Everywhere is dark

Papers

Showing 6170 of 75 papers

TitleStatusHype
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization0
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training0
Support-set bottlenecks for video-text representation learning0
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval0
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment0
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval0
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval0
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning0
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval0
Distilling Vision-Language Models on Millions of Videos0
Show:102550
← PrevPage 7 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FROZEN-revisedmAP23.39Unverified
2FROZEN-revised (two-stream)text-to-video R@112.8Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4Cliptext-to-video R@144.5Unverified
#ModelMetricClaimedVerifiedStatus
1X-CLIP (Cross-Lingual)R@132.3Unverified