SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 76100 of 111 papers

TitleStatusHype
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval0
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval0
Multi-Scale Temporal Difference Transformer for Video-Text Retrieval0
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment0
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning0
Retrieving and Highlighting Action with Spatiotemporal Reference0
Stacked Convolutional Deep Encoding Network for Video-Text Retrieval0
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding0
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval0
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval0
TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval0
Towards Understanding Camera Motions in Any Video0
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval0
Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval0
Uncertainty-aware sign language video retrieval with probability distribution modeling0
Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval0
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
Videoprompter: an ensemble of foundational models for zero-shot video understanding0
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval0
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation0
Show:102550
← PrevPage 4 of 5Next →

No leaderboard results yet.