SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 251300 of 486 papers

TitleStatusHype
Generative Semantic Communication: Architectures, Technologies, and Applications0
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach0
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning0
Grounding Physical Object and Event Concepts Through Dynamic Visual Reasoning0
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training0
HiVLP: Hierarchical Interactive Video-Language Pre-Training0
HORUS: Multimodal Large Language Models Framework for Video Retrieval at VBS 20250
Human Action Recognition and Prediction: A Survey0
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations0
Improving Video Retrieval by Adaptive Margin0
MuMUR : Multilingual Multimodal Universal Retrieval0
Induce, Edit, Retrieve:Language Grounded Multimodal Schema for Instructional Video Retrieval0
Interactive Video Retrieval with Dialog0
Key Frame Extraction with Attention Based Deep Neural Networks0
KPCA Spatio-temporal trajectory point cloud classifier for recognizing human actions in a CBVR system0
Large-Scale Query-by-Image Video Retrieval Using Bloom Filters0
Large Scale Video Representation Learning via Relational Graph Clustering0
Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval0
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision0
LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval0
Learning and Recognizing Human Action from Skeleton Movement with Deep Residual Neural Networks0
Learning Audio-Video Modalities from Image Captions0
Learning Joint Representations of Videos and Sentences with Web Image Search0
Learning Language-Visual Embedding for Movie Understanding with Natural-Language0
Learning Locally-Adaptive Decision Functions for Person Verification0
Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval0
Learning text-to-video retrieval from image captioning0
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living0
Learning Trajectory-Word Alignments for Video-Language Tasks0
Learning World Models for Interactive Video Generation0
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review0
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning0
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval0
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling0
Live Laparoscopic Video Retrieval with Compressed Uncertainty0
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning0
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory0
Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval0
MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed0
MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding0
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval0
Masking Modalities for Cross-modal Video Retrieval0
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval0
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization0
MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline0
Modality-Balanced Embedding for Video Retrieval0
Motion Sensitive Contrastive Learning for Self-supervised Video Representation0
MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling0
Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval0
Multi-Granularity Graph Pooling for Video-based Person Re-Identification0
Show:102550
← PrevPage 6 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified