Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 132 papers

Title	Date	Tasks	Status	Hype
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding	Oct 11, 2024	HallucinationMoment Retrieval	CodeCode Available	1
Saliency-Guided DETR for Moment Retrieval and Highlight Detection	Oct 2, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1
Show and Guide: Instructional-Plan Grounded Vision and Language Model	Sep 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
EAGLE: Egocentric AGgregated Language-video Engine	Sep 26, 2024	Action RecognitionActivity Recognition	—Unverified	0
Language-based Audio Moment Retrieval	Sep 24, 2024	audio moment retrievalMoment Retrieval	CodeCode Available	3
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching	Aug 23, 2024	Highlight DetectionMoment Retrieval	—Unverified	0
QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval	Aug 23, 2024	Contrastive LearningMoment Retrieval	—Unverified	0
Disentangle and denoise: Tackling context misalignment for video moment retrieval	Aug 14, 2024	DenoisingDisentanglement	—Unverified	0
Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection	Aug 6, 2024	audio moment retrievalHighlight Detection	CodeCode Available	3
SLVideo: A Sign Language Video Moment Retrieval Framework	Jul 22, 2024	Moment RetrievalRetrieval	—Unverified	0
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval	Jul 21, 2024	General KnowledgeHighlight Detection	CodeCode Available	2
Multi-sentence Video Grounding for Long Video Generation	Jul 18, 2024	Moment RetrievalRetrieval	—Unverified	0
EA-VTR: Event-Aware Video-Text Retrieval	Jul 10, 2024	Action RecognitionContrastive Learning	—Unverified	0
TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries	Jul 9, 2024	Moment RetrievalRetrieval	CodeCode Available	0
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding	Jul 6, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval	Jun 26, 2024	Action LocalizationMoment Retrieval	CodeCode Available	2
MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval	Jun 25, 2024	cross-modal alignmentMoment Retrieval	—Unverified	0
2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval	Jun 10, 2024	Boundary DetectionMachine Reading Comprehension	—Unverified	0
Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels	Jun 3, 2024	Moment RetrievalRetrieval	—Unverified	0
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding	May 22, 2024	Dense Video CaptioningHighlight Detection	CodeCode Available	2
Context-Enhanced Video Moment Retrieval with Large Language Models	May 21, 2024	cross-modal alignmentLanguage Modeling	—Unverified	0
MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human Motions	Apr 21, 2024	Moment RetrievalSentence	CodeCode Available	1
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection	Apr 14, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection	Apr 7, 2024	Action DetectionMoment Queries	CodeCode Available	2
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding	Apr 2, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	0

Show:10 25 50

← PrevPage 2 of 6Next →

All datasets QVHighlights Charades-STA

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	UnLoc-L	R@1 IoU=0.5	66.1	—	Unverified
2	UnLoc-B	R@1 IoU=0.5	64.5	—	Unverified
3	DenoiseLoc	R@1 IoU=0.5	59.27	—	Unverified
4	SG-DETR (w/ PT)	mAP	58.8	—	Unverified
5	SG-DETR	mAP	54.1	—	Unverified
6	LLaVA-MR	mAP	52.73	—	Unverified
7	FlashVTG	mAP	52	—	Unverified
8	InternVideo2-6B	mAP	49.24	—	Unverified
9	CG-DETR (w/ PT)	mAP	47.97	—	Unverified
10	VideoLights-B-pt	mAP	47.94	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SG-DETR (w/ PT)	R@1 IoU=0.5	71.1	—	Unverified
2	LLaVA-MR	R@1 IoU=0.5	70.65	—	Unverified
3	FlashVTG	R@1 IoU=0.5	70.32	—	Unverified
4	SG-DETR	R@1 IoU=0.5	70.2	—	Unverified
5	InternVideo2-6B	R@1 IoU=0.5	70.03	—	Unverified
6	InternVideo2-1B	R@1 IoU=0.5	68.36	—	Unverified
7	VideoChat-T (FT)	R@1 IoU=0.5	67.1	—	Unverified
8	UniMD+Sync.	R@1 IoU=0.5	63.98	—	Unverified
9	LD-DETR	R@1 IoU=0.5	62.58	—	Unverified
10	VideoLights-B-pt	R@1 IoU=0.5	61.96	—	Unverified