SOTAVerified|Agents Browse Leaderboard About Blog

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 111 papers

Title	Date	Tasks	Status	Hype
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning	May 11, 2024	Image-text matchingRetrieval	—Unverified	0
Learning with Noisy Correspondence	Apr 13, 2024	Cross-Modal RetrievalCross-modal retrieval with noisy correspondence	—Unverified	0
HaVTR: Improving Video-Text Retrieval Through Augmentation Using Large Foundation Models	Apr 7, 2024	HallucinationRepresentation Learning	—Unverified	0
vid-TLDR: Training Free Token merging for Light-weight Video Transformer	Mar 20, 2024	Action RecognitionComputational Efficiency	CodeCode Available	2
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval	Feb 26, 2024	RetrievalText Retrieval	—Unverified	0
Video Editing for Video Retrieval	Feb 4, 2024	RetrievalText Retrieval	—Unverified	0
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval	Jan 31, 2024	RetrievalText Retrieval	CodeCode Available	2
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks	Dec 21, 2023	Image RetrievalImage-to-Text Retrieval	CodeCode Available	1
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval	Dec 19, 2023	Few-Shot LearningRetrieval	CodeCode Available	1
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos	Dec 11, 2023	Natural Language Moment RetrievalNatural Language Queries	CodeCode Available	1
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning	Dec 10, 2023	Language ModelingLanguage Modelling	—Unverified	0
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding	Dec 4, 2023	Dense CaptioningHighlight Detection	CodeCode Available	2
Harvest Video Foundation Models via Efficient Post-Pretraining	Oct 30, 2023	Question AnsweringText Retrieval	—Unverified	0
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding	Oct 29, 2023	FormLanguage Modelling	CodeCode Available	1
Videoprompter: an ensemble of foundational models for zero-shot video understanding	Oct 23, 2023	Action RecognitionDescriptive	—Unverified	0
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data	Oct 8, 2023	Action RecognitionContinual Learning	CodeCode Available	1
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment	Oct 3, 2023	Audio ClassificationContrastive Learning	CodeCode Available	4
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval	Sep 29, 2023	Cross-Modal RetrievalImage-text matching	CodeCode Available	1
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval	Sep 21, 2023	Domain AdaptationRetrieval	—Unverified	0
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval	Sep 21, 2023	Domain AdaptationRetrieval	—Unverified	0
Unified Coarse-to-Fine Alignment for Video-Text Retrieval	Sep 18, 2023	RetrievalText Retrieval	CodeCode Available	1
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory	Aug 28, 2023	Question AnsweringRetrieval	CodeCode Available	1
Multi-event Video-Text Retrieval	Aug 22, 2023	Language ModellingRetrieval	CodeCode Available	1
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model	Aug 15, 2023	DecoderObject	CodeCode Available	1
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter	Jun 22, 2023	Question AnsweringRetrieval	CodeCode Available	0

Show:10 25 50

← PrevPage 2 of 5Next →

No leaderboard results yet.