SOTAVerified

Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Showing 150 of 132 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Language-based Audio Moment RetrievalCode3
Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight DetectionCode3
Video Mamba Suite: State Space Model as a Versatile Alternative for Video UnderstandingCode3
Number it: Temporal Grounding Videos like Flipping MangaCode2
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningCode2
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment RetrievalCode2
The Surprising Effectiveness of Multimodal Large Language Models for Video Moment RetrievalCode2
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal GroundingCode2
UniMD: Towards Unifying Moment Retrieval and Temporal Action DetectionCode2
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight DetectionCode2
Correlation-Guided Query-Dependency Calibration for Video Temporal GroundingCode2
UniVTG: Towards Unified Video-Language Temporal GroundingCode2
TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion SynthesisCode2
Query-Dependent Video Representation for Moment Retrieval and Highlight DetectionCode2
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionCode2
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight DetectionCode1
A Flexible and Scalable Framework for Video Moment SearchCode1
Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight DetectionCode1
Length-Aware DETR for Robust Moment RetrievalCode1
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal GroundingCode1
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment RetrievalCode1
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the WildCode1
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video UnderstandingCode1
Saliency-Guided DETR for Moment Retrieval and Highlight DetectionCode1
MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human MotionsCode1
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight DetectionCode1
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment RetrievalCode1
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in VideosCode1
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight DetectionCode1
MomentDiff: Generative Video Moment Retrieval from Random to RealCode1
Background-aware Moment Detection for Video Moment RetrievalCode1
Joint Moment Retrieval and Highlight Detection Via Natural Language QueriesCode1
Hierarchical Video-Moment Retrieval and Step-CaptioningCode1
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active LearningCode1
Selective Query-guided Debiasing for Video Corpus Moment RetrievalCode1
Partially Relevant Video RetrievalCode1
You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in VideosCode1
Video Moment Retrieval from Text Queries via Single Frame AnnotationCode1
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio DescriptionsCode1
Detecting Moments and Highlights in Videos via Natural Language QueriesCode1
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment RetrievalCode1
MTVR: Multilingual Moment Retrieval in VideosCode1
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language QueriesCode1
Deconfounded Video Moment Retrieval with Causal InterventionCode1
Video Corpus Moment Retrieval with Contrastive LearningCode1
VLG-Net: Video-Language Graph Matching Network for Video GroundingCode1
Frame-wise Cross-modal Matching for Video Moment RetrievalCode1
Uncovering Hidden Challenges in Query-Based Video Moment RetrievalCode1
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UnLoc-LR@1 IoU=0.566.1Unverified
2UnLoc-BR@1 IoU=0.564.5Unverified
3DenoiseLocR@1 IoU=0.559.27Unverified
4SG-DETR (w/ PT)mAP58.8Unverified
5SG-DETRmAP54.1Unverified
6LLaVA-MRmAP52.73Unverified
7FlashVTGmAP52Unverified
8InternVideo2-6BmAP49.24Unverified
9CG-DETR (w/ PT)mAP47.97Unverified
10VideoLights-B-ptmAP47.94Unverified
#ModelMetricClaimedVerifiedStatus
1SG-DETR (w/ PT)R@1 IoU=0.571.1Unverified
2LLaVA-MRR@1 IoU=0.570.65Unverified
3FlashVTGR@1 IoU=0.570.32Unverified
4SG-DETRR@1 IoU=0.570.2Unverified
5InternVideo2-6BR@1 IoU=0.570.03Unverified
6InternVideo2-1BR@1 IoU=0.568.36Unverified
7VideoChat-T (FT)R@1 IoU=0.567.1Unverified
8UniMD+Sync.R@1 IoU=0.563.98Unverified
9LD-DETRR@1 IoU=0.562.58Unverified
10VideoLights-B-ptR@1 IoU=0.561.96Unverified