SOTAVerified|Agents Browse Leaderboard About Blog

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 76–100 of 111 papers

Title	Date	Tasks	Status
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders	Apr 4, 2025	Self-Supervised LearningText Retrieval	—Unverified
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval	Dec 2, 2022	Image-text RetrievalRetrieval	—Unverified
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval	May 13, 2023	RetrievalText Retrieval	—Unverified
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval	Mar 29, 2021	RetrievalText Retrieval	—Unverified
Multi-Scale Temporal Difference Transformer for Video-Text Retrieval	Jun 23, 2024	RetrievalText Retrieval	—Unverified
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality	Aug 18, 2024	RetrievalText Retrieval	—Unverified
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks	Sep 15, 2022	Action ClassificationAction Recognition	—Unverified
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment	Jan 1, 2025	RelationRetrieval	—Unverified
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning	May 11, 2024	Image-text matchingRetrieval	—Unverified
Retrieving and Highlighting Action with Spatiotemporal Reference	May 19, 2020	Action RecognitionCross-Modal Retrieval	—Unverified
Stacked Convolutional Deep Encoding Network for Video-Text Retrieval	Apr 10, 2020	Language ModelingLanguage Modelling	—Unverified
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding	Jan 16, 2022	RetrievalText Retrieval	—Unverified
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding	Mar 11, 2022	RetrievalText Retrieval	—Unverified
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval	Jan 30, 2023	Language ModelingLanguage Modelling	—Unverified
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval	Sep 27, 2022	Cross-Modal RetrievalRetrieval	—Unverified
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval	Sep 28, 2022	cross-modal alignmentRetrieval	—Unverified
Towards Understanding Camera Motions in Any Video	Apr 21, 2025	Question AnsweringText Retrieval	—Unverified
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval	Sep 21, 2023	Domain AdaptationRetrieval	—Unverified
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval	Sep 21, 2023	Domain AdaptationRetrieval	—Unverified
Uncertainty-aware sign language video retrieval with probability distribution modeling	May 30, 2024	RetrievalSign Language Retrieval	—Unverified
Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval	Sep 28, 2022	Contrastive LearningRetrieval	—Unverified
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval	Feb 26, 2024	RetrievalText Retrieval	—Unverified
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts	Mar 3, 2025	Contrastive LearningText Retrieval	—Unverified
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts	Jan 1, 2025	Contrastive LearningText Retrieval	—Unverified
Videoprompter: an ensemble of foundational models for zero-shot video understanding	Oct 23, 2023	Action RecognitionDescriptive	—Unverified

Show:10 25 50

← PrevPage 4 of 5Next →

No leaderboard results yet.