SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 301325 of 486 papers

TitleStatusHype
Efficient video indexing for monitoring disease activity and progression in the upper gastrointestinal tract0
Ego-Surfing: Person Localization in First-Person Videos Using Ego-Motion Signatures0
Empowering Agentic Video Analytics Systems with Video Language Models0
Encode the Unseen: Predictive Video Hashing for Scalable Mid-Stream Retrieval0
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering0
End-to-end Generative Pretraining for Multimodal Video Captioning0
Enhanced Multimodal Representation Learning with Cross-modal KD0
Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models0
Event-aware Video Corpus Moment Retrieval0
Event Extraction in Video Transcripts0
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer0
ExpertAF: Expert Actionable Feedback from Video0
Exploiting Visual Semantic Reasoning for Video-Text Retrieval0
Exploring Relations in Untrimmed Videos for Self-Supervised Learning0
Face Video Retrieval With Image Query via Hashing Across Euclidean Space and Riemannian Manifold0
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks0
Find and Focus: Retrieve and Localize Video Events with Natural Language Queries0
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings0
Fine-Grained Instance-Level Sketch-Based Video Retrieval0
Fine-grained Text-Video Retrieval with Frozen Image Encoders0
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition0
fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs0
Free-Form Multi-Modal Multimedia Retrieval (4MR)0
Generalizable Multi-linear Attention Network0
Show:102550
← PrevPage 13 of 20Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified