SOTAVerified

Text to Video Retrieval

She's gone I can't find her anywhere I'm looking everywhere for her Everywhere is dark

Papers

Showing 3140 of 75 papers

TitleStatusHype
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners0
VindLU: A Recipe for Effective Video-and-Language PretrainingCode1
X^2-VLM: All-In-One Pre-trained Model For Vision-Language TasksCode2
Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video RetrievalCode0
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training0
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks0
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual ModelingCode1
Partially Relevant Video RetrievalCode1
Clover: Towards A Unified Video-Language Alignment and Fusion ModelCode1
Robustness Analysis of Video-Language Models Against Visual and Language PerturbationsCode0
Show:102550
← PrevPage 4 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FROZEN-revisedmAP23.39Unverified
2FROZEN-revised (two-stream)text-to-video R@112.8Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4Cliptext-to-video R@144.5Unverified
#ModelMetricClaimedVerifiedStatus
1X-CLIP (Cross-Lingual)R@132.3Unverified