SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 451486 of 486 papers

TitleStatusHype
Time-Equivariant Contrastive Video Representation Learning0
Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention0
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking0
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset0
TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba0
TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval0
Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval0
Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising0
Two-person interaction detection using body-pose features and multiple instance learning0
Uncertainty-aware sign language video retrieval with probability distribution modeling0
Unfolding Videos Dynamics via Taylor Expansion0
Unified Embedding and Metric Learning for Zero-Exemplar Event Detection0
Universal Adversarial Head: Practical Protection against Video Data Leakage0
Unsupervised Data Uncertainty Learning in Visual Retrieval Systems0
Unsupervised Segmentation of Action Segments in Egocentric Videos using Gaze0
Use of Affective Visual Information for Summarization of Human-Centric Videos0
V3C - a Research Video Collection0
Video 3D Sampling for Self-supervised Representation Learning0
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding0
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models0
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval0
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding0
Video Editing for Video Retrieval0
Videoprompter: an ensemble of foundational models for zero-shot video understanding0
Video retrieval based on deep convolutional neural network0
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners0
Vi-MIX FOR SELF-SUPERVISED VIDEO REPRESENTATION0
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation0
Visual Information Retrieval in Endoscopic Video Archives0
Visual Semantic Search: Retrieving Videos via Complex Textual Queries0
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding0
VRAG: Region Attention Graphs for Content-Based Video Retrieval0
VRFP: On-the-fly Video Retrieval using Web Images and Fast Fisher Vector Products0
VScript: Controllable Script Generation with Visual Presentation0
Watch Less and Uncover More: Could Navigation Tools Help Users Search and Explore Videos?0
Show:102550
← PrevPage 10 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10STANtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified