SOTAVerified

Text to Video Retrieval

She's gone I can't find her anywhere I'm looking everywhere for her Everywhere is dark

Papers

Showing 2650 of 75 papers

TitleStatusHype
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal RetrievalCode1
Reading-strategy Inspired Visual Representation Learning for Text-to-Video RetrievalCode1
Revisiting the "Video" in Video-Language UnderstandingCode1
Revitalize Region Feature for Democratizing Video-Language Pre-training of RetrievalCode1
StableFusion: Continual Video Retrieval via Frame AdaptationCode1
The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020)Code1
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation LearningCode1
Unified Coarse-to-Fine Alignment for Video-Text RetrievalCode1
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding EvaluationCode1
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and TextCode1
VideoCon: Robust Video-Language Alignment via Contrast CaptionsCode1
VindLU: A Recipe for Effective Video-and-Language PretrainingCode1
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token ModelingCode1
Advancing High-Resolution Video-Language Representation with Large-Scale Video TranscriptionsCode1
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval0
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment0
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks0
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
Retrieving and Highlighting Action with Spatiotemporal Reference0
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners0
Learning text-to-video retrieval from image captioning0
Learning Trajectory-Word Alignments for Video-Language Tasks0
Sakuga-42M Dataset: Scaling Up Cartoon Research0
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review0
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FROZEN-revisedmAP23.39Unverified
2FROZEN-revised (two-stream)text-to-video R@112.8Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4Cliptext-to-video R@144.5Unverified
#ModelMetricClaimedVerifiedStatus
1X-CLIP (Cross-Lingual)R@132.3Unverified