SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 51100 of 111 papers

TitleStatusHype
Video Editing for Video Retrieval0
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception0
An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval0
Beyond Coarse-Grained Matching in Video-Text Retrieval0
Boosting Video-Text Retrieval with Explicit High-Level Semantics0
CLIP2TV: Align, Match and Distill for Video-Text Retrieval0
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations0
Deep Learning for Video-Text Retrieval: a Review0
Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals0
Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval0
EA-VTR: Event-Aware Video-Text Retrieval0
Exploiting Visual Semantic Reasoning for Video-Text Retrieval0
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition0
HaVTR: Improving Video-Text Retrieval Through Augmentation Using Large Foundation Models0
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model0
HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval0
HiVLP: Hierarchical Interactive Video-Language Pre-Training0
LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval0
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval0
Learning Context-Adapted Video-Text Retrieval by Attending to User Comments0
Learning with Noisy Correspondence0
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning0
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders0
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval0
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval0
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval0
Multi-Scale Temporal Difference Transformer for Video-Text Retrieval0
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment0
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning0
Retrieving and Highlighting Action with Spatiotemporal Reference0
Stacked Convolutional Deep Encoding Network for Video-Text Retrieval0
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval0
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval0
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval0
Towards Understanding Camera Motions in Any Video0
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval0
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval0
Uncertainty-aware sign language video retrieval with probability distribution modeling0
Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval0
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
Videoprompter: an ensemble of foundational models for zero-shot video understanding0
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval0
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.