SOTAVerified|Agents Browse Leaderboard About Blog

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 111 papers

Title	Date	Tasks	Status	Hype
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval	Jun 10, 2025	Image CaptioningRetrieval	CodeCode Available	1
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory	May 29, 2025	Contrastive LearningText Retrieval	CodeCode Available	2
LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts	May 20, 2025	Caption GenerationRetrieval	CodeCode Available	1
Towards Understanding Camera Motions in Any Video	Apr 21, 2025	Question AnsweringText Retrieval	—Unverified	0
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders	Apr 4, 2025	Self-Supervised LearningText Retrieval	—Unverified	0
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval	Apr 3, 2025	Information RetrievalRepresentation Learning	—Unverified	0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts	Mar 3, 2025	Contrastive LearningText Retrieval	—Unverified	0
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding	Feb 9, 2025	Image CaptioningImage-text Retrieval	CodeCode Available	3
Expertized Caption Auto-Enhancement for Video-Text Retrieval	Feb 5, 2025	Caption GenerationRetrieval	CodeCode Available	0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts	Jan 1, 2025	Contrastive LearningText Retrieval	—Unverified	0

Show:10 25 50

← PrevPage 1 of 12Next →

No leaderboard results yet.