SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 301350 of 486 papers

TitleStatusHype
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training0
You were saying? - Spoken Language in the V3C DatasetCode0
Contextual Explainable Video Representation: Human Perception-based UnderstandingCode0
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners0
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval0
Renmin University of China at TRECVID 2022: Improving Video Search by Feature Fusion and Negation Understanding0
Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video RetrievalCode0
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training0
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset0
CLOP: Video-and-Language Pre-Training with Knowledge Regularizations0
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling0
Efficient Cross-Modal Video Retrieval with Meta-Optimized FramesCode0
Semantic Video Moments Retrieval at Scale: A New Task and a Baseline0
RaP: Redundancy-aware Video-language Pre-training for Text-Video RetrievalCode0
Learning to Locate Visual Answer in Video Corpus Using QuestionCode0
Contrastive Video-Language Learning with Fine-grained Frame Sampling0
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks0
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video RetrievalCode0
Event Extraction in Video Transcripts0
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval0
Multi-Granularity Graph Pooling for Video-based Person Re-Identification0
Pose-Aided Video-based Person Re-Identification via Recurrent Graph Convolutional Network0
Semi-automatic Data Annotation System for Multi-Target Multi-Camera Vehicle Tracking0
Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
Temporal Contrastive Learning with Curriculum0
MuMUR : Multilingual Multimodal Universal Retrieval0
STAR-GNN: Spatial-Temporal Video Representation for Content-based Retrieval0
Motion Sensitive Contrastive Learning for Self-supervised Video Representation0
QSAM-Net: Rain streak removal by quaternion neural network with self-attention module0
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation LearningCode0
LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval0
Robustness Analysis of Video-Language Models Against Visual and Language PerturbationsCode0
Exploiting Semantic Role Contextualized Video Features for Multi-Instance Text-Video Retrieval EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022Code0
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video RetrievalCode0
Semantic Role Aware Correlation Transformer for Text to Video RetrievalCode0
VRAG: Region Attention Graphs for Content-Based Video Retrieval0
Learning to Retrieve Videos by Asking QuestionsCode0
Learn to Understand Negation in Video RetrievalCode0
Relevance-based Margin for Contrastively-trained Video Retrieval ModelsCode0
A Survey of Video-based Action Quality Assessment0
Modality-Balanced Embedding for Video Retrieval0
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval0
Exploring the Temporal Cues to Enhance Video Retrieval on Standardized CDVACode0
Probabilistic Representations for Video Contrastive Learning0
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations0
Socratic Models: Composing Zero-Shot Multimodal Reasoning with LanguageCode0
Learning Audio-Video Modalities from Image Captions0
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation0
Controllable Augmentations for Video Representation Learning0
Show:102550
← PrevPage 7 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified