SOTAVerified

Text to Video Retrieval

She's gone I can't find her anywhere I'm looking everywhere for her Everywhere is dark

Papers

Showing 150 of 75 papers

TitleStatusHype
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval0
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review0
Towards Efficient Partially Relevant Video Retrieval with Active Moment DiscoveringCode0
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video RetrievalCode0
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval0
StableFusion: Continual Video Retrieval via Frame AdaptationCode1
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual AdvertisingCode0
EA-VTR: Event-Aware Video-Text Retrieval0
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval0
Sakuga-42M Dataset: Scaling Up Cartoon Research0
Learning text-to-video retrieval from image captioning0
Distilling Vision-Language Models on Millions of Videos0
Holistic Features are almost Sufficient for Text-to-Video RetrievalCode1
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation LearningCode1
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning0
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer0
VideoCon: Robust Video-Language Alignment via Contrast CaptionsCode1
An Empirical Study of Frame Selection for Text-to-Video Retrieval0
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal RetrievalCode1
Unified Coarse-to-Fine Alignment for Video-Text RetrievalCode1
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval0
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment0
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in IndonesianCode0
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation ModelsCode1
Efficient End-to-End Video Question Answering with Pyramidal Multimodal TransformerCode0
Temporal Perceiving Video-Language Pre-training0
Learning Trajectory-Word Alignments for Video-Language Tasks0
Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video RetrievalCode1
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners0
VindLU: A Recipe for Effective Video-and-Language PretrainingCode1
X^2-VLM: All-In-One Pre-trained Model For Vision-Language TasksCode2
Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video RetrievalCode0
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training0
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks0
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual ModelingCode1
Partially Relevant Video RetrievalCode1
Clover: Towards A Unified Video-Language Alignment and Fusion ModelCode1
Robustness Analysis of Video-Language Models Against Visual and Language PerturbationsCode0
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video RetrievalCode0
Semantic Role Aware Correlation Transformer for Text to Video RetrievalCode0
LAVENDER: Unifying Video-Language Understanding as Masked Language ModelingCode1
Revealing Single Frame Bias for Video-and-Language LearningCode2
Revisiting the "Video" in Video-Language UnderstandingCode1
Learning to Retrieve Videos by Asking QuestionsCode0
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text RetrievalCode1
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval0
ECLIPSE: Efficient Long-range Video Retrieval using Sight and SoundCode1
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and RetrievalCode1
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FROZEN-revisedmAP23.39Unverified
2FROZEN-revised (two-stream)text-to-video R@112.8Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP4Cliptext-to-video R@144.5Unverified
#ModelMetricClaimedVerifiedStatus
1X-CLIP (Cross-Lingual)R@132.3Unverified