SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 151200 of 486 papers

TitleStatusHype
COOT: Cooperative Hierarchical Transformer for Video-Text Representation LearningCode1
Self-supervised Co-training for Video Representation LearningCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
COSA: Concatenated Sample Pretrained Vision-Language Foundation ModelCode1
Generalized Few-Shot Video Classification with Video Retrieval and Feature GenerationCode1
AVLnet: Learning Audio-Visual Language Representations from Instructional VideosCode1
CoVR-2: Automatic Data Construction for Composed Video RetrievalCode1
Self-supervised Video Representation Learning with Cross-Stream Prototypical ContrastingCode1
Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web DataCode1
GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant Video RetrievalCode1
Let All be Whitened: Multi-teacher Distillation for Efficient Visual RetrievalCode1
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer LearningCode1
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding EvaluationCode1
Cross-Architecture Self-supervised Video Representation LearningCode1
Cross-Modal Adapter for Text-Video RetrievalCode1
StableFusion: Continual Video Retrieval via Frame AdaptationCode1
Cross Modal Retrieval with Querybank NormalisationCode1
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-trainingCode1
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual ModelingCode1
Hierarchical Video-Moment Retrieval and Step-CaptioningCode1
DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy MinimizationCode1
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Holistic Features are almost Sufficient for Text-to-Video RetrievalCode1
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video RetrievalCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video ClipsCode1
HowToCaption: Prompting LLMs to Transform Video Annotations at ScaleCode1
Event-aware Video Corpus Moment Retrieval0
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing0
Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models0
Enhanced Multimodal Representation Learning with Cross-modal KD0
ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency0
End-to-end Generative Pretraining for Multimodal Video Captioning0
Coarse to Fine: Video Retrieval before Moment Localization0
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering0
Encode the Unseen: Predictive Video Hashing for Scalable Mid-Stream Retrieval0
CNN Retrieval based Unsupervised Metric Learning for Near-Duplicated Video Retrieval0
MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding0
Empowering Agentic Video Analytics Systems with Video Language Models0
Ego-Surfing: Person Localization in First-Person Videos Using Ego-Motion Signatures0
CMAWRNet: Multiple Adverse Weather Removal via a Unified Quaternion Neural Architecture0
A Review of Deep Learning for Video Captioning0
Efficient video indexing for monitoring disease activity and progression in the upper gastrointestinal tract0
Action in Mind: A Neural Network Approach to Action Recognition and Segmentation0
Efficient Action Detection in Untrimmed Videos via Multi-Task Learning0
CLOP: Video-and-Language Pre-Training with Knowledge Regularizations0
MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed0
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval0
A Proposal-based Approach for Activity Image-to-Video Retrieval0
EA-VTR: Event-Aware Video-Text Retrieval0
Show:102550
← PrevPage 4 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified