SOTAVerified

Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Showing 101125 of 132 papers

TitleStatusHype
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio DescriptionsCode1
Detecting Moments and Highlights in Videos via Natural Language QueriesCode1
Multi-scale 2D Representation Learning for weakly-supervised moment retrieval0
Coarse to Fine: Video Retrieval before Moment Localization0
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation0
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment RetrievalCode1
MTVR: Multilingual Moment Retrieval in VideosCode1
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language QueriesCode1
Video Moment Retrieval with Text Query Considering Many-to-Many Correspondence Using Potentially Relevant Pair0
Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval0
Deconfounded Video Moment Retrieval with Causal InterventionCode1
Video Corpus Moment Retrieval with Contrastive LearningCode1
Fast Video Moment Retrieval0
VLG-Net: Video-Language Graph Matching Network for Video GroundingCode1
Frame-wise Cross-modal Matching for Video Moment RetrievalCode1
Video Moment Retrieval via Natural Language Queries0
Uncovering Hidden Challenges in Query-Based Video Moment RetrievalCode1
VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment RetrievalCode1
Text-based Localization of Moments in a Video Corpus0
Generating Adjacency Matrix for Video Relocalization0
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in VideosCode1
Graph Neural Network for Video Relocalization0
Language Guided Networks for Cross-modal Moment Retrieval0
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-trainingCode1
TVR: A Large-Scale Dataset for Video-Subtitle Moment RetrievalCode1
Show:102550
← PrevPage 5 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UnLoc-LR@1 IoU=0.566.1Unverified
2UnLoc-BR@1 IoU=0.564.5Unverified
3DenoiseLocR@1 IoU=0.559.27Unverified
4SG-DETR (w/ PT)mAP58.8Unverified
5SG-DETRmAP54.1Unverified
6LLaVA-MRmAP52.73Unverified
7FlashVTGmAP52Unverified
8InternVideo2-6BmAP49.24Unverified
9CG-DETR (w/ PT)mAP47.97Unverified
10VideoLights-B-ptmAP47.94Unverified
#ModelMetricClaimedVerifiedStatus
1SG-DETR (w/ PT)R@1 IoU=0.571.1Unverified
2LLaVA-MRR@1 IoU=0.570.65Unverified
3FlashVTGR@1 IoU=0.570.32Unverified
4SG-DETRR@1 IoU=0.570.2Unverified
5InternVideo2-6BR@1 IoU=0.570.03Unverified
6InternVideo2-1BR@1 IoU=0.568.36Unverified
7VideoChat-T (FT)R@1 IoU=0.567.1Unverified
8UniMD+Sync.R@1 IoU=0.563.98Unverified
9LD-DETRR@1 IoU=0.562.58Unverified
10VideoLights-B-ptR@1 IoU=0.561.96Unverified