SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 51100 of 111 papers

TitleStatusHype
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text RetrievalCode0
CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectivesCode0
CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive LearningCode0
Diving Deep into the Motion Representation of Video-Text ModelsCode0
Expertized Caption Auto-Enhancement for Video-Text RetrievalCode0
Harvest Video Foundation Models via Efficient Post-PretrainingCode0
Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text RetrievalCode0
Rudder: A Cross Lingual Video and Text Retrieval DatasetCode0
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible AdapterCode0
Video-Text Retrieval by Supervised Sparse Multi-Grained LearningCode0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval0
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval0
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders0
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning0
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment0
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning0
Retrieving and Highlighting Action with Spatiotemporal Reference0
Learning with Noisy Correspondence0
Learning Context-Adapted Video-Text Retrieval by Attending to User Comments0
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval0
Beyond Coarse-Grained Matching in Video-Text Retrieval0
LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval0
Stacked Convolutional Deep Encoding Network for Video-Text Retrieval0
HiVLP: Hierarchical Interactive Video-Language Pre-Training0
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval0
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval0
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval0
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model0
HaVTR: Improving Video-Text Retrieval Through Augmentation Using Large Foundation Models0
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval0
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition0
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval0
Towards Understanding Camera Motions in Any Video0
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval0
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval0
Uncertainty-aware sign language video retrieval with probability distribution modeling0
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
Exploiting Visual Semantic Reasoning for Video-Text Retrieval0
Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval0
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval0
EA-VTR: Event-Aware Video-Text Retrieval0
Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
Video Editing for Video Retrieval0
Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals0
Deep Learning for Video-Text Retrieval: a Review0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.