Video Grounding

Video grounding is the task of linking spoken language descriptions to specific video segments. In video grounding, the model is given a video and a natural language description, such as a sentence or a caption, and its goal is to identify the specific segment of the video that corresponds to the description. This can involve tasks such as localizing the objects or actions mentioned in the description within the video, or associating a specific time interval with the description.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 91–100 of 114 papers

Title	Date	Tasks	Status
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models	May 24, 2025	BenchmarkingVideo Grounding	—Unverified
Unsupervised Temporal Video Grounding with Deep Semantic Clustering	Jan 14, 2022	ClusteringSentence	—Unverified
VideoGEM: Training-free Action Grounding in Videos	Mar 26, 2025	Video Grounding	—Unverified
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	Dec 31, 2023	Spatio-Temporal Video GroundingVideo Grounding	—Unverified
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	Jan 1, 2024	Spatio-Temporal Video GroundingVideo Grounding	—Unverified
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding	Jul 17, 2025	Video GroundingVideo Understanding	—Unverified
Video LLMs for Temporal Reasoning in Long Videos	Dec 4, 2024	Action SegmentationDense Video Captioning	—Unverified
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition	May 7, 2024	Large Language ModelMultimodal Large Language Model	—Unverified
ViGT: Proposal-free Video Grounding with Learnable Token in Transformer	Aug 11, 2023	Feature Correlationregression	—Unverified
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses	Aug 3, 2024	Natural Language QueriesVideo Grounding	—Unverified

Show:10 25 50

← PrevPage 10 of 12Next →

All datasets QVHighlights MAD

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	InternVideo2-6B	R@1,IoU=0.7	56.45	—	Unverified
2	InternVideo2-1B	R@1,IoU=0.7	54.45	—	Unverified
3	LLMEPET	R@1,IoU=0.7	49.94	—	Unverified
4	QD-DETR	R@1,IoU=0.7	44.98	—	Unverified
5	DiffusionVMR	R@1,IoU=0.7	44.49	—	Unverified
6	UMT	R@1,IoU=0.7	41.18	—	Unverified
7	Moment-DETR	R@1,IoU=0.7	33.02	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DeCafNet	R@1,IoU=0.1	13.25	—	Unverified
2	DenoiseLoc	R@1,IoU=0.1	11.59	—	Unverified