Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 132 papers

Title	Date	Tasks	Status	Hype
DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding	Jun 16, 2025	Activity RecognitionHuman Activity Recognition	—Unverified	0
Retrieval Augmented Generation Evaluation for Health Documents	May 7, 2025	Moment RetrievalRAG	—Unverified	0
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection	Apr 20, 2025	Action DetectionDecoder	—Unverified	0
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking	Apr 11, 2025	Moment RetrievalQuestion Answering	—Unverified	0
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos	Mar 9, 2025	Action LocalizationBoundary Detection	CodeCode Available	1
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval	Feb 18, 2025	Action RecognitionMoment Retrieval	—Unverified	0
Moment of Untruth: Dealing with Negative Queries in Video Moment Retrieval	Feb 12, 2025	AvgMoment Retrieval	CodeCode Available	0
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection	Jan 18, 2025	Contrastive LearningDecoder	CodeCode Available	1
Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection	Jan 18, 2025	AvgHighlight Detection	—Unverified	0
Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models	Jan 14, 2025	Moment RetrievalRetrieval	—Unverified	0
The Devil is in the Spurious Correlation: Boosting Moment Retrieval via Temporal Dynamic Learning	Jan 13, 2025	Moment RetrievalRetrieval	—Unverified	0
A Flexible and Scalable Framework for Video Moment Search	Jan 9, 2025	Moment RetrievalRe-Ranking	CodeCode Available	1
Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection	Jan 5, 2025	Contrastive LearningHighlight Detection	CodeCode Available	1
DTOS: Dynamic Time Object Sensing with Large Multimodal Model	Jan 1, 2025	Moment RetrievalReferring Video Object Segmentation	CodeCode Available	0
Anchor-Aware Similarity Cohesion in Target Frames Enables Predicting Temporal Moment Boundaries in 2D	Jan 1, 2025	Moment RetrievalSemantic Similarity	CodeCode Available	0
Length-Aware DETR for Robust Moment Retrieval	Dec 30, 2024	Information RetrievalMoment Retrieval	CodeCode Available	1
DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments	Dec 28, 2024	Action LocalizationAction Recognition	—Unverified	0
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding	Dec 18, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning	Dec 18, 2024	Moment RetrievalMulti-Task Learning	—Unverified	0
Agent-based Video Trimming	Dec 12, 2024	Highlight DetectionMoment Retrieval	—Unverified	0
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval	Dec 2, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild	Dec 1, 2024	Moment RetrievalRetrieval	CodeCode Available	1
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval	Nov 21, 2024	Moment RetrievalNatural Language Moment Retrieval	CodeCode Available	0
Number it: Temporal Grounding Videos like Flipping Manga	Nov 15, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	2
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Oct 25, 2024	EgoSchemaHallucination	CodeCode Available	2
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding	Oct 11, 2024	HallucinationMoment Retrieval	CodeCode Available	1
Saliency-Guided DETR for Moment Retrieval and Highlight Detection	Oct 2, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1
Show and Guide: Instructional-Plan Grounded Vision and Language Model	Sep 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
EAGLE: Egocentric AGgregated Language-video Engine	Sep 26, 2024	Action RecognitionActivity Recognition	—Unverified	0
Language-based Audio Moment Retrieval	Sep 24, 2024	audio moment retrievalMoment Retrieval	CodeCode Available	3
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching	Aug 23, 2024	Highlight DetectionMoment Retrieval	—Unverified	0
QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval	Aug 23, 2024	Contrastive LearningMoment Retrieval	—Unverified	0
Disentangle and denoise: Tackling context misalignment for video moment retrieval	Aug 14, 2024	DenoisingDisentanglement	—Unverified	0
Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection	Aug 6, 2024	audio moment retrievalHighlight Detection	CodeCode Available	3
SLVideo: A Sign Language Video Moment Retrieval Framework	Jul 22, 2024	Moment RetrievalRetrieval	—Unverified	0
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval	Jul 21, 2024	General KnowledgeHighlight Detection	CodeCode Available	2
Multi-sentence Video Grounding for Long Video Generation	Jul 18, 2024	Moment RetrievalRetrieval	—Unverified	0
EA-VTR: Event-Aware Video-Text Retrieval	Jul 10, 2024	Action RecognitionContrastive Learning	—Unverified	0
TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries	Jul 9, 2024	Moment RetrievalRetrieval	CodeCode Available	0
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding	Jul 6, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval	Jun 26, 2024	Action LocalizationMoment Retrieval	CodeCode Available	2
MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval	Jun 25, 2024	cross-modal alignmentMoment Retrieval	—Unverified	0
2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval	Jun 10, 2024	Boundary DetectionMachine Reading Comprehension	—Unverified	0
Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels	Jun 3, 2024	Moment RetrievalRetrieval	—Unverified	0
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding	May 22, 2024	Dense Video CaptioningHighlight Detection	CodeCode Available	2
Context-Enhanced Video Moment Retrieval with Large Language Models	May 21, 2024	cross-modal alignmentLanguage Modeling	—Unverified	0
MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human Motions	Apr 21, 2024	Moment RetrievalSentence	CodeCode Available	1
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection	Apr 14, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection	Apr 7, 2024	Action DetectionMoment Queries	CodeCode Available	2
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding	Apr 2, 2024	Highlight DetectionMoment Retrieval	—Unverified	0

Show:10 25 50

← PrevPage 1 of 3Next →

All datasets QVHighlights Charades-STA

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	UnLoc-L	R@1 IoU=0.5	66.1	—	Unverified
2	UnLoc-B	R@1 IoU=0.5	64.5	—	Unverified
3	DenoiseLoc	R@1 IoU=0.5	59.27	—	Unverified
4	SG-DETR (w/ PT)	mAP	58.8	—	Unverified
5	SG-DETR	mAP	54.1	—	Unverified
6	LLaVA-MR	mAP	52.73	—	Unverified
7	FlashVTG	mAP	52	—	Unverified
8	InternVideo2-6B	mAP	49.24	—	Unverified
9	CG-DETR (w/ PT)	mAP	47.97	—	Unverified
10	VideoLights-B-pt	mAP	47.94	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SG-DETR (w/ PT)	R@1 IoU=0.5	71.1	—	Unverified
2	LLaVA-MR	R@1 IoU=0.5	70.65	—	Unverified
3	FlashVTG	R@1 IoU=0.5	70.32	—	Unverified
4	SG-DETR	R@1 IoU=0.5	70.2	—	Unverified
5	InternVideo2-6B	R@1 IoU=0.5	70.03	—	Unverified
6	InternVideo2-1B	R@1 IoU=0.5	68.36	—	Unverified
7	VideoChat-T (FT)	R@1 IoU=0.5	67.1	—	Unverified
8	UniMD+Sync.	R@1 IoU=0.5	63.98	—	Unverified
9	LD-DETR	R@1 IoU=0.5	62.58	—	Unverified
10	VideoLights-B-pt	R@1 IoU=0.5	61.96	—	Unverified