SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 301350 of 486 papers

TitleStatusHype
MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling0
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval0
Multi-Granularity Graph Pooling for Video-based Person Re-Identification0
Multimodal Approach for Video Surveillance Indexing and Retrieval0
Multimodal Contextualized Support for Enhancing Video Retrieval System0
Multimodal Skip-gram Using Convolutional Pseudowords0
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence0
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval0
MultiVENT: Multilingual Videos of Events with Aligned Natural Text0
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions0
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality0
Near-duplicate video detection featuring coupled temporal and perceptual visual structures and logical inference based matching0
Neighborhood Preserving Hashing for Scalable Video Retrieval0
Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce0
NEWSKVQA: Knowledge-Aware News Video Question Answering0
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision0
Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
Perfect Match in Video Retrieval0
PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval0
PolySmart @ TRECVid 2024 Medical Video Question Answering0
Pose-Aided Video-based Person Re-Identification via Recurrent Graph Convolutional Network0
Probabilistic Representations for Video Contrastive Learning0
ProTA: Probabilistic Token Aggregation for Text-Video Retrieval0
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval0
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval0
QSAM-Net: Rain streak removal by quaternion neural network with self-attention module0
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model0
Query by Semantic Sketch0
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning0
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter0
Real-time analysis of cataract surgery videos using statistical models0
Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding0
RNNs, CNNs and Transformers in Human Action Recognition: A Survey and a Hybrid Model0
Self-supervised Spatiotemporal Representation Learning by Exploiting Video Continuity0
Self-supervised Temporal Learning0
Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder0
Self-Supervised Video Representation Learning with Meta-Contrastive Network0
Self-Supervised Video Representation Learning by Video Incoherence Detection0
Self-supervised Video Retrieval Transformer Network0
Semantic Image Retrieval by Uniting Deep Neural Networks and Cognitive Architectures0
Semantic Video Entity Linking Based on Visual Content and Metadata0
Semantic Video Moments Retrieval at Scale: A New Task and a Baseline0
Semi-automatic Data Annotation System for Multi-Target Multi-Camera Vehicle Tracking0
Sharing Hash Codes for Multiple Purposes0
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval0
Sign Language Video Retrieval with Free-Form Textual Queries0
Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval0
SketchGAN: Joint Sketch Completion and Recognition With Generative Adversarial Network0
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training0
Show:102550
← PrevPage 7 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified