Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 132 papers

Title	Date	Tasks	Status	Hype
Video Moment Retrieval from Text Queries via Single Frame Annotation	Apr 20, 2022	Contrastive LearningMoment Retrieval	CodeCode Available	1
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos	Mar 9, 2025	Action LocalizationBoundary Detection	CodeCode Available	1
Finding Moments in Video Collections Using Natural Language	Jul 30, 2019	Moment RetrievalRe-Ranking	CodeCode Available	1
Detecting Moments and Highlights in Videos via Natural Language Queries	Dec 1, 2021	DecoderMoment Retrieval	CodeCode Available	1
Video Corpus Moment Retrieval with Contrastive Learning	May 13, 2021	Contrastive LearningMoment Retrieval	CodeCode Available	1
Selective Query-guided Debiasing for Video Corpus Moment Retrieval	Oct 17, 2022	Moment RetrievalRetrieval	CodeCode Available	1
Deconfounded Video Moment Retrieval with Causal Intervention	Jun 3, 2021	Moment RetrievalRetrieval	CodeCode Available	1
Saliency-Guided DETR for Moment Retrieval and Highlight Detection	Oct 2, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection	Apr 14, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1
Partially Relevant Video Retrieval	Aug 26, 2022	Moment RetrievalMultiple Instance Learning	CodeCode Available	1
Background-aware Moment Detection for Video Moment Retrieval	Jun 5, 2023	Moment RetrievalNatural Language Moment Retrieval	CodeCode Available	1
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection	Jan 18, 2025	Contrastive LearningDecoder	CodeCode Available	1
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos	Nov 30, 2023	Moment RetrievalNatural Language Moment Retrieval	CodeCode Available	1
Frame-wise Cross-modal Matching for Video Moment Retrieval	Sep 22, 2020	Boundary DetectionMoment Retrieval	CodeCode Available	1
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos	Aug 19, 2020	Moment RetrievalRetrieval	CodeCode Available	1
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training	May 1, 2020	Language ModelingLanguage Modelling	CodeCode Available	1
Hierarchical Video-Moment Retrieval and Step-Captioning	Mar 29, 2023	Information RetrievalMoment Retrieval	CodeCode Available	1
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding	Dec 18, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1
MomentDiff: Generative Video Moment Retrieval from Random to Real	Jul 6, 2023	Moment RetrievalRetrieval	CodeCode Available	1
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection	Nov 28, 2023	Contrastive LearningHighlight Detection	CodeCode Available	1
Length-Aware DETR for Robust Moment Retrieval	Dec 30, 2024	Information RetrievalMoment Retrieval	CodeCode Available	1
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions	Dec 1, 2021	Moment RetrievalNatural Language Moment Retrieval	CodeCode Available	1
MTVR: Multilingual Moment Retrieval in Videos	Jul 30, 2021	Moment RetrievalRetrieval	CodeCode Available	1
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries	Jul 20, 2021	Highlight DetectionMoment Retrieval	CodeCode Available	1
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval	Dec 19, 2023	cross-modal alignmentMoment Retrieval	CodeCode Available	1

Show:10 25 50

← PrevPage 2 of 6Next →

All datasets QVHighlights Charades-STA

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	UnLoc-L	R@1 IoU=0.5	66.1	—	Unverified
2	UnLoc-B	R@1 IoU=0.5	64.5	—	Unverified
3	DenoiseLoc	R@1 IoU=0.5	59.27	—	Unverified
4	SG-DETR (w/ PT)	mAP	58.8	—	Unverified
5	SG-DETR	mAP	54.1	—	Unverified
6	LLaVA-MR	mAP	52.73	—	Unverified
7	FlashVTG	mAP	52	—	Unverified
8	InternVideo2-6B	mAP	49.24	—	Unverified
9	CG-DETR (w/ PT)	mAP	47.97	—	Unverified
10	VideoLights-B-pt	mAP	47.94	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SG-DETR (w/ PT)	R@1 IoU=0.5	71.1	—	Unverified
2	LLaVA-MR	R@1 IoU=0.5	70.65	—	Unverified
3	FlashVTG	R@1 IoU=0.5	70.32	—	Unverified
4	SG-DETR	R@1 IoU=0.5	70.2	—	Unverified
5	InternVideo2-6B	R@1 IoU=0.5	70.03	—	Unverified
6	InternVideo2-1B	R@1 IoU=0.5	68.36	—	Unverified
7	VideoChat-T (FT)	R@1 IoU=0.5	67.1	—	Unverified
8	UniMD+Sync.	R@1 IoU=0.5	63.98	—	Unverified
9	LD-DETR	R@1 IoU=0.5	62.58	—	Unverified
10	VideoLights-B-pt	R@1 IoU=0.5	61.96	—	Unverified