SOTAVerified

Text to Video Retrieval

She's gone I can't find her anywhere I'm looking everywhere for her Everywhere is dark

Papers

Showing 2650 of 75 papers

TitleStatusHype
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token ModelingCode1
Advancing High-Resolution Video-Language Representation with Large-Scale Video TranscriptionsCode1
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding EvaluationCode1
DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy MinimizationCode1
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled VideosCode1
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and TextCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
MDMMT: Multidomain Multimodal Transformer for Video RetrievalCode1
Less is More: ClipBERT for Video-and-Language Learning via Sparse SamplingCode1
The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020)Code1
Condensed Movies: Story Based Retrieval with Contextual EmbeddingsCode1
End-to-End Learning of Visual Representations from Uncurated Instructional VideosCode1
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video ClipsCode1
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval0
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review0
Towards Efficient Partially Relevant Video Retrieval with Active Moment DiscoveringCode0
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video RetrievalCode0
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval0
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual AdvertisingCode0
EA-VTR: Event-Aware Video-Text Retrieval0
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval0
Sakuga-42M Dataset: Scaling Up Cartoon Research0
Learning text-to-video retrieval from image captioning0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FROZEN-revisedmAP23.39Unverified
2FROZEN-revised (two-stream)text-to-video R@112.8Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4Cliptext-to-video R@144.5Unverified
#ModelMetricClaimedVerifiedStatus
1X-CLIP (Cross-Lingual)R@132.3Unverified