Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 132 papers

Title	Date	Tasks	Status	Hype	Score
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	Mar 22, 2024	Action ClassificationAction Recognition	CodeCode Available	7	5
Language-based Audio Moment Retrieval	Sep 24, 2024	audio moment retrievalMoment Retrieval	CodeCode Available	3	5
Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection	Aug 6, 2024	audio moment retrievalHighlight Detection	CodeCode Available	3	5
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding	Mar 14, 2024	MambaMoment Retrieval	CodeCode Available	3	5
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection	Mar 23, 2022	DecoderHighlight Detection	CodeCode Available	2	5
The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval	Jun 26, 2024	Action LocalizationMoment Retrieval	CodeCode Available	2	5
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection	Mar 24, 2023	Highlight DetectionMoment Retrieval	CodeCode Available	2	5
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection	Jan 4, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	2	5
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection	Apr 7, 2024	Action DetectionMoment Queries	CodeCode Available	2	5
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding	May 22, 2024	Dense Video CaptioningHighlight Detection	CodeCode Available	2	5
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Oct 25, 2024	EgoSchemaHallucination	CodeCode Available	2	5
UniVTG: Towards Unified Video-Language Temporal Grounding	Jul 31, 2023	Highlight DetectionMoment Retrieval	CodeCode Available	2	5
Number it: Temporal Grounding Videos like Flipping Manga	Nov 15, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	2	5
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding	Nov 15, 2023	Highlight DetectionMoment Retrieval	CodeCode Available	2	5
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval	Jul 21, 2024	General KnowledgeHighlight Detection	CodeCode Available	2	5
TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis	May 2, 2023	Moment RetrievalMotion Generation	CodeCode Available	2	5
A Flexible and Scalable Framework for Video Moment Search	Jan 9, 2025	Moment RetrievalRe-Ranking	CodeCode Available	1	5
Joint Moment Retrieval and Highlight Detection Via Natural Language Queries	May 8, 2023	DecoderHighlight Detection	CodeCode Available	1	5
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval	Sep 21, 2021	Corpus Video Moment RetrievalMoment Retrieval	CodeCode Available	1	5
Hierarchical Video-Moment Retrieval and Step-Captioning	Mar 29, 2023	Information RetrievalMoment Retrieval	CodeCode Available	1	5
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning	Jan 1, 2023	Active LearningMoment Retrieval	CodeCode Available	1	5
Background-aware Moment Detection for Video Moment Retrieval	Jun 5, 2023	Moment RetrievalNatural Language Moment Retrieval	CodeCode Available	1	5
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection	Nov 28, 2023	Contrastive LearningHighlight Detection	CodeCode Available	1	5
Detecting Moments and Highlights in Videos via Natural Language Queries	Dec 1, 2021	DecoderMoment Retrieval	CodeCode Available	1	5
Frame-wise Cross-modal Matching for Video Moment Retrieval	Sep 22, 2020	Boundary DetectionMoment Retrieval	CodeCode Available	1	5

Show:10 25 50

← PrevPage 1 of 6Next →

All datasets QVHighlights Charades-STA

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	UnLoc-L	R@1 IoU=0.5	66.1	—	Unverified
2	UnLoc-B	R@1 IoU=0.5	64.5	—	Unverified
3	DenoiseLoc	R@1 IoU=0.5	59.27	—	Unverified
4	SG-DETR (w/ PT)	mAP	58.8	—	Unverified
5	SG-DETR	mAP	54.1	—	Unverified
6	LLaVA-MR	mAP	52.73	—	Unverified
7	FlashVTG	mAP	52	—	Unverified
8	InternVideo2-6B	mAP	49.24	—	Unverified
9	CG-DETR (w/ PT)	mAP	47.97	—	Unverified
10	VideoLights-B-pt	mAP	47.94	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SG-DETR (w/ PT)	R@1 IoU=0.5	71.1	—	Unverified
2	LLaVA-MR	R@1 IoU=0.5	70.65	—	Unverified
3	FlashVTG	R@1 IoU=0.5	70.32	—	Unverified
4	SG-DETR	R@1 IoU=0.5	70.2	—	Unverified
5	InternVideo2-6B	R@1 IoU=0.5	70.03	—	Unverified
6	InternVideo2-1B	R@1 IoU=0.5	68.36	—	Unverified
7	VideoChat-T (FT)	R@1 IoU=0.5	67.1	—	Unverified
8	UniMD+Sync.	R@1 IoU=0.5	63.98	—	Unverified
9	LD-DETR	R@1 IoU=0.5	62.58	—	Unverified
10	VideoLights-B-pt	R@1 IoU=0.5	61.96	—	Unverified