Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 132 papers

Title	Date	Tasks	Status	Hype
Deconfounded Video Moment Retrieval with Causal Intervention	Jun 3, 2021	Moment RetrievalRetrieval	CodeCode Available	1
Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection	Jan 5, 2025	Contrastive LearningHighlight Detection	CodeCode Available	1
Background-aware Moment Detection for Video Moment Retrieval	Jun 5, 2023	Moment RetrievalNatural Language Moment Retrieval	CodeCode Available	1
Partially Relevant Video Retrieval	Aug 26, 2022	Moment RetrievalMultiple Instance Learning	CodeCode Available	1
Video Moment Retrieval from Text Queries via Single Frame Annotation	Apr 20, 2022	Contrastive LearningMoment Retrieval	CodeCode Available	1
Retrieval Augmented Generation Evaluation for Health Documents	May 7, 2025	Moment RetrievalRAG	—Unverified	0
2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval	Jun 10, 2024	Boundary DetectionMachine Reading Comprehension	—Unverified	0
Agent-based Video Trimming	Dec 12, 2024	Highlight DetectionMoment Retrieval	—Unverified	0
A Survey on Video Moment Localization	Jun 13, 2023	Action LocalizationMoment Retrieval	—Unverified	0
AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval	Mar 30, 2022	Moment RetrievalRetrieval	—Unverified	0
Coarse to Fine: Video Retrieval before Moment Localization	Oct 14, 2021	Moment RetrievalRetrieval	—Unverified	0
Context-Enhanced Video Moment Retrieval with Large Language Models	May 21, 2024	cross-modal alignmentLanguage Modeling	—Unverified	0
Cross-Lingual Cross-Modal Consolidation for Effective Multilingual Video Corpus Moment Retrieval	Jul 1, 2022	Moment RetrievalRetrieval	—Unverified	0
DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments	Dec 28, 2024	Action LocalizationAction Recognition	—Unverified	0
DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding	Jun 16, 2025	Activity RecognitionHuman Activity Recognition	—Unverified	0
DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection	Aug 29, 2023	DenoisingHighlight Detection	—Unverified	0
Disentangle and denoise: Tackling context misalignment for video moment retrieval	Aug 14, 2024	DenoisingDisentanglement	—Unverified	0
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching	Aug 23, 2024	Highlight DetectionMoment Retrieval	—Unverified	0
EAGLE: Egocentric AGgregated Language-video Engine	Sep 26, 2024	Action RecognitionActivity Recognition	—Unverified	0
EA-VTR: Event-Aware Video-Text Retrieval	Jul 10, 2024	Action RecognitionContrastive Learning	—Unverified	0
Event-aware Video Corpus Moment Retrieval	Feb 21, 2024	Contrastive LearningMoment Retrieval	—Unverified	0
Faster Video Moment Retrieval with Point-Level Supervision	May 23, 2023	Moment RetrievalNatural Language Queries	—Unverified	0
Fast Video Moment Retrieval	Jan 1, 2021	Moment RetrievalRetrieval	—Unverified	0
FedVMR: A New Federated Learning method for Video Moment Retrieval	Oct 28, 2022	Federated LearningMoment Retrieval	—Unverified	0
Generating Adjacency Matrix for Video Relocalization	Aug 19, 2020	Moment Retrieval	—Unverified	0
Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval	Jan 24, 2024	Moment RetrievalRetrieval	—Unverified	0
GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features	Mar 3, 2024	DecoderHighlight Detection	—Unverified	0
Graph Neural Network for Video Relocalization	Jul 20, 2020	Graph Neural NetworkMoment Retrieval	—Unverified	0
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection	Apr 20, 2025	Action DetectionDecoder	—Unverified	0
Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels	Jun 3, 2024	Moment RetrievalRetrieval	—Unverified	0
Interactive Video Corpus Moment Retrieval using Reinforcement Learning	Feb 19, 2023	Moment Retrievalreinforcement-learning	—Unverified	0
Language Guided Networks for Cross-modal Moment Retrieval	Jun 18, 2020	Moment RetrievalRetrieval	—Unverified	0
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning	Dec 10, 2023	Language ModelingLanguage Modelling	—Unverified	0
MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment	Nov 30, 2018	Moment RetrievalNatural Language Moment Retrieval	—Unverified	0
MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval	Jun 25, 2024	cross-modal alignmentMoment Retrieval	—Unverified	0
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval	Feb 18, 2025	Action RecognitionMoment Retrieval	—Unverified	0
Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval	Sep 23, 2022	cross-modal alignmentInformation Retrieval	—Unverified	0
Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection	Jan 18, 2025	AvgHighlight Detection	—Unverified	0
Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval	Jun 19, 2021	Cross-Modal RetrievalGraph Matching	—Unverified	0
Multi-scale 2D Representation Learning for weakly-supervised moment retrieval	Nov 4, 2021	Moment RetrievalRepresentation Learning	—Unverified	0
Multi-sentence Video Grounding for Long Video Generation	Jul 18, 2024	Moment RetrievalRetrieval	—Unverified	0
Multi-video Moment Ranking with Multimodal Clue	Jan 29, 2023	Moment RetrievalRetrieval	—Unverified	0
QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval	Aug 23, 2024	Contrastive LearningMoment Retrieval	—Unverified	0
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning	Dec 18, 2024	Moment RetrievalMulti-Task Learning	—Unverified	0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding	Apr 2, 2024	Highlight DetectionMoment Retrieval	—Unverified	0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding	Mar 31, 2024	Highlight DetectionMoment Retrieval	—Unverified	0
SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval	Oct 8, 2023	Moment RetrievalRetrieval	—Unverified	0
SLVideo: A Sign Language Video Moment Retrieval Framework	Jul 22, 2024	Moment RetrievalRetrieval	—Unverified	0
Temporal Perceiving Video-Language Pre-training	Jan 18, 2023	Action LocalizationContrastive Learning	—Unverified	0
Text-based Localization of Moments in a Video Corpus	Aug 20, 2020	Moment RetrievalRetrieval	—Unverified	0

Show:10 25 50

← PrevPage 2 of 3Next →

All datasets QVHighlights Charades-STA

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	UnLoc-L	R@1 IoU=0.5	66.1	—	Unverified
2	UnLoc-B	R@1 IoU=0.5	64.5	—	Unverified
3	DenoiseLoc	R@1 IoU=0.5	59.27	—	Unverified
4	SG-DETR (w/ PT)	mAP	58.8	—	Unverified
5	SG-DETR	mAP	54.1	—	Unverified
6	LLaVA-MR	mAP	52.73	—	Unverified
7	FlashVTG	mAP	52	—	Unverified
8	InternVideo2-6B	mAP	49.24	—	Unverified
9	CG-DETR (w/ PT)	mAP	47.97	—	Unverified
10	VideoLights-B-pt	mAP	47.94	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SG-DETR (w/ PT)	R@1 IoU=0.5	71.1	—	Unverified
2	LLaVA-MR	R@1 IoU=0.5	70.65	—	Unverified
3	FlashVTG	R@1 IoU=0.5	70.32	—	Unverified
4	SG-DETR	R@1 IoU=0.5	70.2	—	Unverified
5	InternVideo2-6B	R@1 IoU=0.5	70.03	—	Unverified
6	InternVideo2-1B	R@1 IoU=0.5	68.36	—	Unverified
7	VideoChat-T (FT)	R@1 IoU=0.5	67.1	—	Unverified
8	UniMD+Sync.	R@1 IoU=0.5	63.98	—	Unverified
9	LD-DETR	R@1 IoU=0.5	62.58	—	Unverified
10	VideoLights-B-pt	R@1 IoU=0.5	61.96	—	Unverified