SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 401450 of 486 papers

TitleStatusHype
Self-supervised Temporal Learning0
SEA: Sentence Encoder Assembly for Video Retrieval by Textual QueriesCode0
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus0
Graph Based Temporal Aggregation for Video RetrievalCode0
Support-set bottlenecks for video-text representation learning0
Encode the Unseen: Predictive Video Hashing for Scalable Mid-Stream Retrieval0
TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval0
Discriminative Residual Analysis for Image Set Classification with Posture and Age VariationsCode0
Exploring Relations in Untrimmed Videos for Self-Supervised Learning0
The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval0
Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval0
Exploiting Visual Semantic Reasoning for Video-Text Retrieval0
Large Scale Video Representation Learning via Relational Graph Clustering0
Screencast Tutorial Video UnderstandingCode0
Near-duplicate video detection featuring coupled temporal and perceptual visual structures and logical inference based matching0
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence0
AMIL: Adversarial Multi Instance Learning for Human Pose EstimationCode0
Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningCode0
Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a VideoCode0
Fine-Grained Instance-Level Sketch-Based Video Retrieval0
A Proposal-based Approach for Activity Image-to-Video Retrieval0
Deep Heterogeneous Hashing for Face Video Retrieval0
SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval0
Neighborhood Preserving Hashing for Scalable Video Retrieval0
Query by Semantic Sketch0
Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA0
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings0
Central Similarity Quantization for Efficient Image and Video RetrievalCode0
SketchGAN: Joint Sketch Completion and Recognition With Generative Adversarial Network0
Spatio-temporal Video Re-localization by Warp LSTM0
Efficient video indexing for monitoring disease activity and progression in the upper gastrointestinal tract0
Interactive Video Retrieval with Dialog0
Unsupervised Data Uncertainty Learning in Visual Retrieval Systems0
V3C - a Research Video Collection0
Dual Encoding for Zero-Example Video RetrievalCode0
FIVR: Fine-grained Incident Video RetrievalCode0
Find and Focus: Retrieve and Localize Video Events with Natural Language Queries0
Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos0
Video Logo Retrieval based on local FeaturesCode0
A Joint Sequence Fusion Model for Video Question Answering and RetrievalCode0
Person Search in Videos with One Portrait Through Visual and Temporal LinksCode0
Talking Face Generation by Adversarially Disentangled Audio-Visual RepresentationCode0
Human Action Recognition and Prediction: A Survey0
Semantic Image Retrieval by Uniting Deep Neural Networks and Cognitive Architectures0
Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text RetrievalCode0
LAMV: Learning to Align and Match Videos With Kernelized Temporal LayersCode0
ECO: Efficient Convolutional Network for Online Video UnderstandingCode0
Learning and Recognizing Human Action from Skeleton Movement with Deep Residual Neural Networks0
Hashing with Mutual InformationCode0
Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder0
Show:102550
← PrevPage 9 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified