Text to Video Retrieval

She's gone I can't find her anywhere I'm looking everywhere for her Everywhere is dark

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 75 papers

Title	Date	Tasks	Status
Distilling Vision-Language Models on Millions of Videos	Jan 11, 2024	Language ModelingLanguage Modelling	—Unverified
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning	Dec 10, 2023	Language ModelingLanguage Modelling	—Unverified
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer	Nov 28, 2023	Language ModelingLanguage Modelling	—Unverified
An Empirical Study of Frame Selection for Text-to-Video Retrieval	Nov 1, 2023	RetrievalText to Video Retrieval	—Unverified
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval	Aug 2, 2023	Retrievaltext similarity	—Unverified
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment	Jul 24, 2023	RetrievalText to Video Retrieval	—Unverified
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian	Jun 20, 2023	Cross-Lingual TransferRetrieval	CodeCode Available
Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer	Feb 4, 2023	Computational EfficiencyQuestion Answering	CodeCode Available
Temporal Perceiving Video-Language Pre-training	Jan 18, 2023	Action LocalizationContrastive Learning	—Unverified
Learning Trajectory-Word Alignments for Video-Language Tasks	Jan 5, 2023	Question AnsweringRetrieval	—Unverified
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners	Dec 9, 2022	Question AnsweringRetrieval	—Unverified
Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval	Nov 21, 2022	AllRetrieval	CodeCode Available
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training	Nov 21, 2022	cross-modal alignmentGPU	—Unverified
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks	Oct 10, 2022	RetrievalText to Video Retrieval	—Unverified
Robustness Analysis of Video-Language Models Against Visual and Language Perturbations	Jul 5, 2022	Language ModelingLanguage Modelling	CodeCode Available
Semantic Role Aware Correlation Transformer for Text to Video Retrieval	Jun 26, 2022	RetrievalText to Video Retrieval	CodeCode Available
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval	Jun 26, 2022	Mixture-of-ExpertsRetrieval	CodeCode Available
Learning to Retrieve Videos by Asking Questions	May 11, 2022	AI AgentRetrieval	CodeCode Available
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval	Apr 15, 2022	Contrastive LearningCross-Modal Retrieval	—Unverified
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks	Mar 24, 2022	Action RecognitionRetrieval	CodeCode Available
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization	Mar 14, 2022	RetrievalText to Video Retrieval	—Unverified
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning	Apr 1, 2021	Question AnsweringRepresentation Learning	—Unverified
Support-set bottlenecks for video-text representation learning	Oct 6, 2020	Contrastive LearningRepresentation Learning	—Unverified
Retrieving and Highlighting Action with Spatiotemporal Reference	May 19, 2020	Action RecognitionCross-Modal Retrieval	—Unverified
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning	Mar 6, 2020	Density EstimationNoise Estimation	CodeCode Available

Show:10 25 50

← PrevPage 2 of 2Next →

All datasets Kinetics-GEB+MSR-VTT MSVD-Indonesian

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	FROZEN-revised	mAP	23.39	—	Unverified
2	FROZEN-revised (two-stream)	text-to-video R@1	12.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CLIP4Clip	text-to-video R@1	44.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	X-CLIP (Cross-Lingual)	R@1	32.3	—	Unverified