SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 201250 of 486 papers

TitleStatusHype
Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos0
HORUS: Multimodal Large Language Models Framework for Video Retrieval at VBS 20250
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
Hierarchical Banzhaf Interaction for General Video-Language Representation LearningCode0
PolySmart @ TRECVid 2024 Medical Video Question Answering0
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning0
Generative Semantic Communication: Architectures, Technologies, and Applications0
Multimodal Contextualized Support for Enhancing Video Retrieval System0
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual AdvertisingCode0
Generating Signed Language Instructions in Large-Scale Dialogue SystemsCode0
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval0
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models0
TokenBinder: Text-Video Retrieval with One-to-Many Alignment ParadigmCode0
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding0
Unfolding Videos Dynamics via Taylor Expansion0
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets0
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality0
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach0
Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics0
Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce0
ExpertAF: Expert Actionable Feedback from Video0
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language RetrievalCode0
Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval0
MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline0
EA-VTR: Event-Aware Video-Text Retrieval0
MAMA: Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation LearningCode0
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling0
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval0
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset0
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model0
Uncertainty-aware sign language video retrieval with probability distribution modeling0
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter0
Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models0
Learning text-to-video retrieval from image captioning0
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval0
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval0
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval0
Improving Video Corpus Moment Retrieval with Partial Relevance EnhancementCode0
Event-aware Video Corpus Moment Retrieval0
Video Editing for Video Retrieval0
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing0
Distilling Vision-Language Models on Millions of Videos0
Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks0
Detours for Navigating Instructional Videos0
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision0
WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary KnowledgeCode0
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning0
Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval0
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval0
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding0
Show:102550
← PrevPage 5 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified