SOTAVerified

Text to Video Retrieval

She's gone I can't find her anywhere I'm looking everywhere for her Everywhere is dark

Papers

Showing 2130 of 75 papers

TitleStatusHype
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation ModelsCode1
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalCode1
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video RetrievalCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and RetrievalCode1
Holistic Features are almost Sufficient for Text-to-Video RetrievalCode1
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video ClipsCode1
LAVENDER: Unifying Video-Language Understanding as Masked Language ModelingCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal RetrievalCode1
Show:102550
← PrevPage 3 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FROZEN-revisedmAP23.39Unverified
2FROZEN-revised (two-stream)text-to-video R@112.8Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4Cliptext-to-video R@144.5Unverified
#ModelMetricClaimedVerifiedStatus
1X-CLIP (Cross-Lingual)R@132.3Unverified