Moment Retrieval

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–132 of 132 papers

Title	Date	Tasks	Status
The Devil is in the Spurious Correlation: Boosting Moment Retrieval via Temporal Dynamic Learning	Jan 13, 2025	Moment RetrievalRetrieval	—Unverified
Temporal Sentence Grounding in Videos: A Survey and Future Directions	Jan 20, 2022	Moment RetrievalRetrieval	—Unverified
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking	Apr 11, 2025	Moment RetrievalQuestion Answering	—Unverified
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training	Feb 28, 2023	Moment RetrievalRetrieval	—Unverified
UnLoc: A Unified Framework for Video Localization Tasks	Aug 21, 2023	Action SegmentationMoment Retrieval	—Unverified
Video Moment Retrieval via Natural Language Queries	Sep 4, 2020	Moment RetrievalNatural Language Queries	—Unverified
Video Moment Retrieval with Text Query Considering Many-to-Many Correspondence Using Potentially Relevant Pair	Jun 25, 2021	Moment RetrievalRetrieval	—Unverified
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation	Oct 11, 2021	Moment RetrievalRetrieval	—Unverified
Weakly-Supervised Video Moment Retrieval via Semantic Completion Network	Nov 19, 2019	Moment RetrievalRetrieval	—Unverified
wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL	Sep 25, 2019	Moment RetrievalRetrieval	—Unverified
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval	Sep 27, 2019	Moment RetrievalRetrieval	—Unverified
Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models	Jan 14, 2025	Moment RetrievalRetrieval	—Unverified
Zero-shot Video Moment Retrieval With Off-the-Shelf Models	Nov 3, 2022	Moment RetrievalRetrieval	—Unverified
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval	Nov 21, 2024	Moment RetrievalNatural Language Moment Retrieval	CodeCode Available
Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics Domains	Sep 1, 2023	Change Point DetectionInstruction Following	CodeCode Available
Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement	Feb 21, 2024	Moment RetrievalRetrieval	CodeCode Available
Going for GOAL: A Resource for Grounded Football Commentaries	Nov 8, 2022	Moment RetrievalRetrieval	CodeCode Available
Boundary-Denoising for Video Activity Localization	Apr 6, 2023	Action DetectionDecoder	CodeCode Available
Weakly Supervised Video Moment Retrieval From Text Queries	Apr 5, 2019	Moment RetrievalNatural Language Queries	CodeCode Available
Exploring Temporal Concurrency for Video-Language Representation Learning	Jan 1, 2023	Dynamic Time WarpingMetric Learning	CodeCode Available
Anchor-Aware Similarity Cohesion in Target Frames Enables Predicting Temporal Moment Boundaries in 2D	Jan 1, 2025	Moment RetrievalSemantic Similarity	CodeCode Available
DTOS: Dynamic Time Object Sensing with Large Multimodal Model	Jan 1, 2025	Moment RetrievalReferring Video Object Segmentation	CodeCode Available
Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos	Jun 6, 2019	Moment RetrievalNatural Language Queries	CodeCode Available
Towards Diverse Temporal Grounding under Single Positive Labels	Mar 12, 2023	Moment RetrievalRetrieval	CodeCode Available
SimVTP: Simple Video Text Pre-training with Masked Autoencoders	Dec 7, 2022	Contrastive Learningcross-modal alignment	CodeCode Available
TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries	Jul 9, 2024	Moment RetrievalRetrieval	CodeCode Available
Show and Guide: Instructional-Plan Grounded Vision and Language Model	Sep 27, 2024	Language ModelingLanguage Modelling	CodeCode Available
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding	Jul 6, 2024	Language ModelingLanguage Modelling	CodeCode Available
MVMR: A New Framework for Evaluating Faithfulness of Video Moment Retrieval against Multiple Distractors	Aug 15, 2023	Contrastive LearningMisinformation	CodeCode Available
Moment of Untruth: Dealing with Negative Queries in Video Moment Retrieval	Feb 12, 2025	AvgMoment Retrieval	CodeCode Available
Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment Retrieval	Dec 12, 2023	Contrastive LearningMoment Retrieval	CodeCode Available
Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval	Oct 23, 2022	Moment RetrievalMultimodal Reasoning	CodeCode Available

Show:10 25 50

← PrevPage 3 of 3Next →

All datasets QVHighlights Charades-STA

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	UnLoc-L	R@1 IoU=0.5	66.1	—	Unverified
2	UnLoc-B	R@1 IoU=0.5	64.5	—	Unverified
3	DenoiseLoc	R@1 IoU=0.5	59.27	—	Unverified
4	SG-DETR (w/ PT)	mAP	58.8	—	Unverified
5	SG-DETR	mAP	54.1	—	Unverified
6	LLaVA-MR	mAP	52.73	—	Unverified
7	FlashVTG	mAP	52	—	Unverified
8	InternVideo2-6B	mAP	49.24	—	Unverified
9	CG-DETR (w/ PT)	mAP	47.97	—	Unverified
10	VideoLights-B-pt	mAP	47.94	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SG-DETR (w/ PT)	R@1 IoU=0.5	71.1	—	Unverified
2	LLaVA-MR	R@1 IoU=0.5	70.65	—	Unverified
3	FlashVTG	R@1 IoU=0.5	70.32	—	Unverified
4	SG-DETR	R@1 IoU=0.5	70.2	—	Unverified
5	InternVideo2-6B	R@1 IoU=0.5	70.03	—	Unverified
6	InternVideo2-1B	R@1 IoU=0.5	68.36	—	Unverified
7	VideoChat-T (FT)	R@1 IoU=0.5	67.1	—	Unverified
8	UniMD+Sync.	R@1 IoU=0.5	63.98	—	Unverified
9	LD-DETR	R@1 IoU=0.5	62.58	—	Unverified
10	VideoLights-B-pt	R@1 IoU=0.5	61.96	—	Unverified