SOTAVerified

Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Showing 125 of 132 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Language-based Audio Moment RetrievalCode3
Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight DetectionCode3
Video Mamba Suite: State Space Model as a Versatile Alternative for Video UnderstandingCode3
Number it: Temporal Grounding Videos like Flipping MangaCode2
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningCode2
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment RetrievalCode2
The Surprising Effectiveness of Multimodal Large Language Models for Video Moment RetrievalCode2
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal GroundingCode2
UniMD: Towards Unifying Moment Retrieval and Temporal Action DetectionCode2
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight DetectionCode2
Correlation-Guided Query-Dependency Calibration for Video Temporal GroundingCode2
UniVTG: Towards Unified Video-Language Temporal GroundingCode2
TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion SynthesisCode2
Query-Dependent Video Representation for Moment Retrieval and Highlight DetectionCode2
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionCode2
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight DetectionCode1
A Flexible and Scalable Framework for Video Moment SearchCode1
Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight DetectionCode1
Length-Aware DETR for Robust Moment RetrievalCode1
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal GroundingCode1
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment RetrievalCode1
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the WildCode1
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video UnderstandingCode1
Show:102550
← PrevPage 1 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UnLoc-LR@1 IoU=0.566.1Unverified
2UnLoc-BR@1 IoU=0.564.5Unverified
3DenoiseLocR@1 IoU=0.559.27Unverified
4SG-DETR (w/ PT)mAP58.8Unverified
5SG-DETRmAP54.1Unverified
6LLaVA-MRmAP52.73Unverified
7FlashVTGmAP52Unverified
8InternVideo2-6BmAP49.24Unverified
9CG-DETR (w/ PT)mAP47.97Unverified
10VideoLights-B-ptmAP47.94Unverified
#ModelMetricClaimedVerifiedStatus
1SG-DETR (w/ PT)R@1 IoU=0.571.1Unverified
2LLaVA-MRR@1 IoU=0.570.65Unverified
3FlashVTGR@1 IoU=0.570.32Unverified
4SG-DETRR@1 IoU=0.570.2Unverified
5InternVideo2-6BR@1 IoU=0.570.03Unverified
6InternVideo2-1BR@1 IoU=0.568.36Unverified
7VideoChat-T (FT)R@1 IoU=0.567.1Unverified
8UniMD+Sync.R@1 IoU=0.563.98Unverified
9LD-DETRR@1 IoU=0.562.58Unverified
10VideoLights-B-ptR@1 IoU=0.561.96Unverified