SOTAVerified

Text to Video Retrieval

She's gone I can't find her anywhere I'm looking everywhere for her Everywhere is dark

Papers

Showing 4150 of 75 papers

TitleStatusHype
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video RetrievalCode0
Semantic Role Aware Correlation Transformer for Text to Video RetrievalCode0
LAVENDER: Unifying Video-Language Understanding as Masked Language ModelingCode1
Revealing Single Frame Bias for Video-and-Language LearningCode2
Revisiting the "Video" in Video-Language UnderstandingCode1
Learning to Retrieve Videos by Asking QuestionsCode0
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalCode1
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval0
ECLIPSE: Efficient Long-range Video Retrieval using Sight and SoundCode1
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and RetrievalCode1
Show:102550
← PrevPage 5 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FROZEN-revisedmAP23.39Unverified
2FROZEN-revised (two-stream)text-to-video R@112.8Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4Cliptext-to-video R@144.5Unverified
#ModelMetricClaimedVerifiedStatus
1X-CLIP (Cross-Lingual)R@132.3Unverified