SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 201250 of 486 papers

TitleStatusHype
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training0
Contrastive Masked Autoencoders for Self-Supervised Video HashingCode1
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset0
Cross-Modal Adapter for Text-Video RetrievalCode1
3D-CSL: self-supervised 3D context similarity learning for Near-Duplicate Video RetrievalCode1
CLOP: Video-and-Language Pre-Training with Knowledge Regularizations0
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling0
Efficient Cross-Modal Video Retrieval with Meta-Optimized FramesCode0
Semantic Video Moments Retrieval at Scale: A New Task and a Baseline0
RaP: Redundancy-aware Video-language Pre-training for Text-Video RetrievalCode0
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive LearningCode2
Learning to Locate Visual Answer in Video Corpus Using QuestionCode0
Contrastive Video-Language Learning with Fine-grained Frame Sampling0
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks0
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video RetrievalCode0
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video RetrievalCode1
Event Extraction in Video Transcripts0
TVLT: Textless Vision-Language TransformerCode1
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval0
Multi-Granularity Graph Pooling for Video-based Person Re-Identification0
Pose-Aided Video-based Person Re-Identification via Recurrent Graph Convolutional Network0
Marine Video Kit: A New Marine Video Dataset for Content-based Analysis and RetrievalCode1
Semi-automatic Data Annotation System for Multi-Target Multi-Camera Vehicle Tracking0
Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation AlignmentCode2
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual ModelingCode1
Temporal Contrastive Learning with Curriculum0
Partially Relevant Video RetrievalCode1
MuMUR : Multilingual Multimodal Universal Retrieval0
STAR-GNN: Spatial-Temporal Video Representation for Content-based Retrieval0
Motion Sensitive Contrastive Learning for Self-supervised Video Representation0
QSAM-Net: Rain streak removal by quaternion neural network with self-attention module0
A Feature-space Multimodal Data Augmentation Technique for Text-video RetrievalCode1
LocVTP: Video-Text Pre-training for Temporal LocalizationCode1
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation LearningCode0
TS2-Net: Token Shift and Selection Transformer for Text-Video RetrievalCode1
Clover: Towards A Unified Video-Language Alignment and Fusion ModelCode1
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text RetrievalCode1
LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval0
Robustness Analysis of Video-Language Models Against Visual and Language PerturbationsCode0
Exploiting Semantic Role Contextualized Video Features for Multi-Instance Text-Video Retrieval EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022Code0
Semantic Role Aware Correlation Transformer for Text to Video RetrievalCode0
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video RetrievalCode0
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action VideosCode1
LAVENDER: Unifying Video-Language Understanding as Masked Language ModelingCode1
Revealing Single Frame Bias for Video-and-Language LearningCode2
Revisiting the "Video" in Video-Language UnderstandingCode1
Cross-Architecture Self-supervised Video Representation LearningCode1
VRAG: Region Attention Graphs for Content-Based Video Retrieval0
Show:102550
← PrevPage 5 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10STANtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified