SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 351400 of 486 papers

TitleStatusHype
Learning and Recognizing Human Action from Skeleton Movement with Deep Residual Neural Networks0
Learning Audio-Video Modalities from Image Captions0
Learning Joint Representations of Videos and Sentences with Web Image Search0
Learning Language-Visual Embedding for Movie Understanding with Natural-Language0
Learning Locally-Adaptive Decision Functions for Person Verification0
Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval0
Learning text-to-video retrieval from image captioning0
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living0
Learning Trajectory-Word Alignments for Video-Language Tasks0
Learning World Models for Interactive Video Generation0
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review0
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning0
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval0
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling0
Live Laparoscopic Video Retrieval with Compressed Uncertainty0
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning0
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory0
Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval0
MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed0
MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding0
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval0
Masking Modalities for Cross-modal Video Retrieval0
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval0
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization0
MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline0
Modality-Balanced Embedding for Video Retrieval0
Motion Sensitive Contrastive Learning for Self-supervised Video Representation0
MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling0
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval0
Multi-Granularity Graph Pooling for Video-based Person Re-Identification0
Multimodal Approach for Video Surveillance Indexing and Retrieval0
Multimodal Contextualized Support for Enhancing Video Retrieval System0
Multimodal Skip-gram Using Convolutional Pseudowords0
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence0
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval0
MultiVENT: Multilingual Videos of Events with Aligned Natural Text0
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions0
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality0
Near-duplicate video detection featuring coupled temporal and perceptual visual structures and logical inference based matching0
Neighborhood Preserving Hashing for Scalable Video Retrieval0
Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce0
NEWSKVQA: Knowledge-Aware News Video Question Answering0
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision0
Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
Perfect Match in Video Retrieval0
PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval0
PolySmart @ TRECVid 2024 Medical Video Question Answering0
Pose-Aided Video-based Person Re-Identification via Recurrent Graph Convolutional Network0
Probabilistic Representations for Video Contrastive Learning0
Show:102550
← PrevPage 8 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified