SOTAVerified

Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Showing 2650 of 132 papers

TitleStatusHype
Video Moment Retrieval from Text Queries via Single Frame AnnotationCode1
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
Finding Moments in Video Collections Using Natural LanguageCode1
Detecting Moments and Highlights in Videos via Natural Language QueriesCode1
Video Corpus Moment Retrieval with Contrastive LearningCode1
Selective Query-guided Debiasing for Video Corpus Moment RetrievalCode1
Deconfounded Video Moment Retrieval with Causal InterventionCode1
Saliency-Guided DETR for Moment Retrieval and Highlight DetectionCode1
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight DetectionCode1
Partially Relevant Video RetrievalCode1
Background-aware Moment Detection for Video Moment RetrievalCode1
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight DetectionCode1
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in VideosCode1
Frame-wise Cross-modal Matching for Video Moment RetrievalCode1
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in VideosCode1
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-trainingCode1
Hierarchical Video-Moment Retrieval and Step-CaptioningCode1
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal GroundingCode1
MomentDiff: Generative Video Moment Retrieval from Random to RealCode1
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight DetectionCode1
Length-Aware DETR for Robust Moment RetrievalCode1
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio DescriptionsCode1
MTVR: Multilingual Moment Retrieval in VideosCode1
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language QueriesCode1
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment RetrievalCode1
Show:102550
← PrevPage 2 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UnLoc-LR@1 IoU=0.566.1Unverified
2UnLoc-BR@1 IoU=0.564.5Unverified
3DenoiseLocR@1 IoU=0.559.27Unverified
4SG-DETR (w/ PT)mAP58.8Unverified
5SG-DETRmAP54.1Unverified
6LLaVA-MRmAP52.73Unverified
7FlashVTGmAP52Unverified
8InternVideo2-6BmAP49.24Unverified
9CG-DETR (w/ PT)mAP47.97Unverified
10VideoLights-B-ptmAP47.94Unverified
#ModelMetricClaimedVerifiedStatus
1SG-DETR (w/ PT)R@1 IoU=0.571.1Unverified
2LLaVA-MRR@1 IoU=0.570.65Unverified
3FlashVTGR@1 IoU=0.570.32Unverified
4SG-DETRR@1 IoU=0.570.2Unverified
5InternVideo2-6BR@1 IoU=0.570.03Unverified
6InternVideo2-1BR@1 IoU=0.568.36Unverified
7VideoChat-T (FT)R@1 IoU=0.567.1Unverified
8UniMD+Sync.R@1 IoU=0.563.98Unverified
9LD-DETRR@1 IoU=0.562.58Unverified
10VideoLights-B-ptR@1 IoU=0.561.96Unverified