SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 101150 of 486 papers

TitleStatusHype
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer0
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer LearningCode1
VideoCon: Robust Video-Language Alignment via Contrast CaptionsCode1
Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval0
Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval0
An Empirical Study of Frame Selection for Text-to-Video Retrieval0
CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing0
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language UnderstandingCode1
Joint Searching and Grounding: Multi-Granularity Video Content RetrievalCode0
Videoprompter: an ensemble of foundational models for zero-shot video understanding0
Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval0
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video RetrievalCode1
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding TasksCode0
HowToCaption: Prompting LLMs to Transform Video Annotations at ScaleCode1
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal RetrievalCode1
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive LearningCode1
Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval0
Unified Coarse-to-Fine Alignment for Video-Text RetrievalCode1
Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention0
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video RetrievalCode1
Differentiable Resolution Compression and Alignment for Efficient Video Classification and RetrievalCode0
Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics DomainsCode0
CoVR-2: Automatic Data Construction for Composed Video RetrievalCode1
Simple Baselines for Interactive Video Retrieval with Questions and AnswersCode1
Prompt Switch: Efficient CLIP Adaptation for Text-Video RetrievalCode1
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval0
Learning Multi-modal Representations by Watching Hundreds of Surgical Video LecturesCode1
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment0
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and ModelCode1
Fine-grained Text-Video Retrieval with Frozen Image Encoders0
Animate-A-Story: Storytelling with Retrieval-Augmented Video GenerationCode2
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and GenerationCode0
MultiVENT: Multilingual Videos of Events with Aligned Natural Text0
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval ModelsCode0
An overview on the evaluated video retrieval tasks at TRECVID 2022Code1
Key Frame Extraction with Attention Based Deep Neural Networks0
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in IndonesianCode0
COSA: Concatenated Sample Pretrained Vision-Language Foundation ModelCode1
Enhanced Multimodal Representation Learning with Cross-modal KD0
MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding0
An Overview of Challenges in Egocentric Text-Video Retrieval0
fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs0
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetCode2
FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition0
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set AlignmentCode1
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval0
A Large Cross-Modal Video Retrieval Dataset with Reading ComprehensionCode1
A Review of Deep Learning for Video Captioning0
Show:102550
← PrevPage 3 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10STANtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified