SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 101150 of 486 papers

TitleStatusHype
Self-supervised Video Representation Learning by Pace PredictionCode1
DnS: Distill-and-Select for Efficient and Accurate Video Indexing and RetrievalCode1
An overview on the evaluated video retrieval tasks at TRECVID 2022Code1
Self-Supervised Video Similarity LearningCode1
Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video RetrievalCode1
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive LearningCode1
CLIP2Video: Mastering Video-Text Retrieval via Image CLIPCode1
Holistic Features are almost Sufficient for Text-to-Video RetrievalCode1
ECLIPSE: Efficient Long-range Video Retrieval using Sight and SoundCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
Learning video retrieval models with relevance-aware online miningCode1
HowToCaption: Prompting LLMs to Transform Video Annotations at ScaleCode1
A Large Cross-Modal Video Retrieval Dataset with Reading ComprehensionCode1
T2VIndexer: A Generative Video Indexer for Efficient Text-Video RetrievalCode1
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video RetrievalCode1
LocVTP: Video-Text Pre-training for Temporal LocalizationCode1
TEACHTEXT: CrossModal Generalized Distillation for Text-Video RetrievalCode1
TempCLR: Temporal Alignment Representation with Contrastive LearningCode1
Align and Prompt: Video-and-Language Pre-training with Entity PromptsCode1
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set AlignmentCode1
The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020)Code1
End-to-End Learning of Visual Representations from Uncurated Instructional VideosCode1
AssistSR: Task-oriented Video Segment Retrieval for Personal AI AssistantCode1
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation RecognitionCode1
CoCa: Contrastive Captioners are Image-Text Foundation ModelsCode1
TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domainsCode1
LAVENDER: Unifying Video-Language Understanding as Masked Language ModelingCode1
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment RetrievalCode1
Learning a Text-Video Embedding from Incomplete and Heterogeneous DataCode1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
Condensed Movies: Story Based Retrieval with Contextual EmbeddingsCode1
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video RetrievalCode1
Expectation-Maximization Contrastive Learning for Compact Video-and-Language RepresentationsCode1
3D-CSL: self-supervised 3D context similarity learning for Near-Duplicate Video RetrievalCode1
Everything at Once - Multi-Modal Fusion Transformer for Video RetrievalCode1
Hysia: Serving DNN-Based Video-to-Retail Applications in CloudCode1
Temporal Context Aggregation for Video Retrieval with Contrastive LearningCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
Audio-based Near-Duplicate Video Retrieval with Audio Similarity LearningCode1
Learning Multi-modal Representations by Watching Hundreds of Surgical Video LecturesCode1
Less is More: ClipBERT for Video-and-Language Learning via Sparse SamplingCode1
Let All be Whitened: Multi-teacher Distillation for Efficient Visual RetrievalCode1
Advancing High-Resolution Video-Language Representation with Large-Scale Video TranscriptionsCode1
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video RetrievalCode1
Contrastive Masked Autoencoders for Self-Supervised Video HashingCode1
Marine Video Kit: A New Marine Video Dataset for Content-based Analysis and RetrievalCode1
Memory-augmented Dense Predictive Coding for Video Representation LearningCode1
Florence: A New Foundation Model for Computer VisionCode1
Everything at Once -- Multi-modal Fusion Transformer for Video RetrievalCode1
A Straightforward Framework For Video Retrieval Using CLIPCode1
Show:102550
← PrevPage 3 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified