SOTAVerified

Text to Video Retrieval

She's gone I can't find her anywhere I'm looking everywhere for her Everywhere is dark

Papers

Showing 2130 of 75 papers

TitleStatusHype
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal RetrievalCode1
Unified Coarse-to-Fine Alignment for Video-Text RetrievalCode1
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval0
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment0
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in IndonesianCode0
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation ModelsCode1
Efficient End-to-End Video Question Answering with Pyramidal Multimodal TransformerCode0
Temporal Perceiving Video-Language Pre-training0
Learning Trajectory-Word Alignments for Video-Language Tasks0
Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video RetrievalCode1
Show:102550
← PrevPage 3 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FROZEN-revisedmAP23.39Unverified
2FROZEN-revised (two-stream)text-to-video R@112.8Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4Cliptext-to-video R@144.5Unverified
#ModelMetricClaimedVerifiedStatus
1X-CLIP (Cross-Lingual)R@132.3Unverified