SOTAVerified

Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Showing 125 of 132 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Language-based Audio Moment RetrievalCode3
Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight DetectionCode3
Video Mamba Suite: State Space Model as a Versatile Alternative for Video UnderstandingCode3
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionCode2
The Surprising Effectiveness of Multimodal Large Language Models for Video Moment RetrievalCode2
Query-Dependent Video Representation for Moment Retrieval and Highlight DetectionCode2
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight DetectionCode2
UniMD: Towards Unifying Moment Retrieval and Temporal Action DetectionCode2
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal GroundingCode2
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningCode2
UniVTG: Towards Unified Video-Language Temporal GroundingCode2
Number it: Temporal Grounding Videos like Flipping MangaCode2
Correlation-Guided Query-Dependency Calibration for Video Temporal GroundingCode2
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment RetrievalCode2
TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion SynthesisCode2
A Flexible and Scalable Framework for Video Moment SearchCode1
Joint Moment Retrieval and Highlight Detection Via Natural Language QueriesCode1
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment RetrievalCode1
Hierarchical Video-Moment Retrieval and Step-CaptioningCode1
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active LearningCode1
Background-aware Moment Detection for Video Moment RetrievalCode1
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight DetectionCode1
Detecting Moments and Highlights in Videos via Natural Language QueriesCode1
Frame-wise Cross-modal Matching for Video Moment RetrievalCode1
Show:102550
← PrevPage 1 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UnLoc-LR@1 IoU=0.566.1Unverified
2UnLoc-BR@1 IoU=0.564.5Unverified
3DenoiseLocR@1 IoU=0.559.27Unverified
4SG-DETR (w/ PT)mAP58.8Unverified
5SG-DETRmAP54.1Unverified
6LLaVA-MRmAP52.73Unverified
7FlashVTGmAP52Unverified
8InternVideo2-6BmAP49.24Unverified
9CG-DETR (w/ PT)mAP47.97Unverified
10VideoLights-B-ptmAP47.94Unverified
#ModelMetricClaimedVerifiedStatus
1SG-DETR (w/ PT)R@1 IoU=0.571.1Unverified
2LLaVA-MRR@1 IoU=0.570.65Unverified
3FlashVTGR@1 IoU=0.570.32Unverified
4SG-DETRR@1 IoU=0.570.2Unverified
5InternVideo2-6BR@1 IoU=0.570.03Unverified
6InternVideo2-1BR@1 IoU=0.568.36Unverified
7VideoChat-T (FT)R@1 IoU=0.567.1Unverified
8UniMD+Sync.R@1 IoU=0.563.98Unverified
9LD-DETRR@1 IoU=0.562.58Unverified
10VideoLights-B-ptR@1 IoU=0.561.96Unverified