SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 251300 of 486 papers

TitleStatusHype
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding TasksCode0
Video Logo Retrieval based on local FeaturesCode0
Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningCode0
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video RetrievalCode0
A Joint Sequence Fusion Model for Video Question Answering and RetrievalCode0
Circulant temporal encoding for video retrieval and temporal alignmentCode0
Dialogue-to-Video RetrievalCode0
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter0
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling0
Action in Mind: A Neural Network Approach to Action Recognition and Segmentation0
Advances in Human Action Recognition: A Survey0
A Faster Method for Tracking and Scoring Videos Corresponding to Sentences0
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus0
Analysis of Gait Pattern to Recognize the Human Activities0
An Empirical Study of Frame Selection for Text-to-Video Retrieval0
An Improved Video Analysis using Context based Extension of LSH0
An Overview of Challenges in Egocentric Text-Video Retrieval0
A Proposal-based Approach for Activity Image-to-Video Retrieval0
A Review of Deep Learning for Video Captioning0
ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency0
A Survey of Video-based Action Quality Assessment0
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment0
Audio-Visual Embedding for Cross-Modal MusicVideo Retrieval through Supervised Deep CCA0
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset0
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval0
Bag of Genres for Video Retrieval0
Binary Subspace Coding for Query-by-Image Video Retrieval0
Boosting Video Captioning with Dynamic Loss Network0
CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing0
Clarification of Video Retrieval Query Results by the Automated Insertion of Supporting Shots0
Classroom Video Assessment and Retrieval via Multiple Instance Learning0
CLIP2TV: Align, Match and Distill for Video-Text Retrieval0
CLOP: Video-and-Language Pre-Training with Knowledge Regularizations0
CMAWRNet: Multiple Adverse Weather Removal via a Unified Quaternion Neural Architecture0
CNN Retrieval based Unsupervised Metric Learning for Near-Duplicated Video Retrieval0
Coarse to Fine: Video Retrieval before Moment Localization0
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing0
Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval0
Contrastive Video-Language Learning with Fine-grained Frame Sampling0
Controllable Augmentations for Video Representation Learning0
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval0
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation0
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation0
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning0
Deep Heterogeneous Hashing for Face Video Retrieval0
Deep Learning Based Semantic Video Indexing and Retrieval0
De-Hashing: Server-Side Context-Aware Feature Reconstruction for Mobile Visual Search0
Detours for Navigating Instructional Videos0
Discrete Wavelet Transform and Gradient Difference based approach for text localization in videos0
Distilling Vision-Language Models on Millions of Videos0
Show:102550
← PrevPage 6 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified