SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 401450 of 486 papers

TitleStatusHype
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners0
Vi-MIX FOR SELF-SUPERVISED VIDEO REPRESENTATION0
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation0
Visual Information Retrieval in Endoscopic Video Archives0
Visual Semantic Search: Retrieving Videos via Complex Textual Queries0
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding0
VRAG: Region Attention Graphs for Content-Based Video Retrieval0
VRFP: On-the-fly Video Retrieval using Web Images and Fast Fisher Vector Products0
VScript: Controllable Script Generation with Visual Presentation0
Watch Less and Uncover More: Could Navigation Tools Help Users Search and Explore Videos?0
AMIL: Adversarial Multi Instance Learning for Human Pose EstimationCode0
Self-supervised Video Representation Learning by Context and Motion DecouplingCode0
LAMV: Learning to Align and Match Videos With Kernelized Temporal LayersCode0
Joint Searching and Grounding: Multi-Granularity Video Content RetrievalCode0
Self-supervised Video Representation Learning with Cascade Positive RetrievalCode0
Dialogue-to-Video RetrievalCode0
Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a VideoCode0
Is Multimodal Vision Supervision Beneficial to Language?Code0
Semantic Role Aware Correlation Transformer for Text to Video RetrievalCode0
A Challenge to Build Neuro-Symbolic Video AgentsCode0
Deep Hashing with Category Mask for Fast Video RetrievalCode0
Improving Video Corpus Moment Retrieval with Partial Relevance EnhancementCode0
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language RetrievalCode0
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval ModelsCode0
Inter-intra Variant Dual Representations forSelf-supervised Video RecognitionCode0
SEA: Sentence Encoder Assembly for Video Retrieval by Textual QueriesCode0
Screencast Tutorial Video UnderstandingCode0
Rudder: A Cross Lingual Video and Text Retrieval DatasetCode0
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video RetrievalCode0
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video RetrievalCode0
Hashing with Mutual InformationCode0
Accommodating Audio Modality in CLIP for Multimodal ProcessingCode0
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data StreamsCode0
Video-Text Retrieval by Supervised Sparse Multi-Grained LearningCode0
Socratic Models: Composing Zero-Shot Multimodal Reasoning with LanguageCode0
Graph Based Temporal Aggregation for Video RetrievalCode0
Contextual Explainable Video Representation: Human Perception-based UnderstandingCode0
You were saying? - Spoken Language in the V3C DatasetCode0
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation LearningCode0
Unmasked Teacher: Towards Training-Efficient Video Foundation ModelsCode0
Relevance-based Margin for Contrastively-trained Video Retrieval ModelsCode0
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual AdvertisingCode0
Contrastive Alignment with Semantic Gap-Aware Corrections in Text-Video RetrievalCode0
Circulant temporal encoding for video retrieval and temporal alignmentCode0
Aligning Step-by-Step Instructional Diagrams to Video DemonstrationsCode0
Generating Signed Language Instructions in Large-Scale Dialogue SystemsCode0
Central Similarity Quantization for Efficient Image and Video RetrievalCode0
From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained VideosCode0
FIVR: Fine-grained Incident Video RetrievalCode0
Show:102550
← PrevPage 9 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified