Video Grounding

Video grounding is the task of linking spoken language descriptions to specific video segments. In video grounding, the model is given a video and a natural language description, such as a sentence or a caption, and its goal is to identify the specific segment of the video that corresponds to the description. This can involve tasks such as localizing the objects or actions mentioned in the description within the video, or associating a specific time interval with the description.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 114 papers

Title	Date	Tasks	Status	Hype
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding	Jul 17, 2025	Video GroundingVideo Understanding	—Unverified	0
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency	Jun 2, 2025	reinforcement-learningReinforcement Learning	CodeCode Available	2
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models	May 24, 2025	BenchmarkingVideo Grounding	—Unverified	0
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos	May 22, 2025	Natural Language Moment RetrievalNatural Language Queries	CodeCode Available	1
Object-Shot Enhanced Grounding Network for Egocentric Video	May 7, 2025	Video Grounding	CodeCode Available	1
Enhancing Weakly Supervised Video Grounding via Diverse Inference Strategies for Boundary and Prediction Selection	Mar 29, 2025	PredictionVideo Grounding	—Unverified	0
VideoGEM: Training-free Action Grounding in Videos	Mar 26, 2025	Video Grounding	—Unverified	0
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability	Mar 18, 2025	Language ModelingLanguage Modelling	—Unverified	0
TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM	Mar 17, 2025	Video Grounding	CodeCode Available	2
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding	Mar 13, 2025	ObjectVideo Grounding	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 12Next →

All datasets QVHighlights MAD

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	InternVideo2-6B	R@1,IoU=0.7	56.45	—	Unverified
2	InternVideo2-1B	R@1,IoU=0.7	54.45	—	Unverified
3	LLMEPET	R@1,IoU=0.7	49.94	—	Unverified
4	QD-DETR	R@1,IoU=0.7	44.98	—	Unverified
5	DiffusionVMR	R@1,IoU=0.7	44.49	—	Unverified
6	UMT	R@1,IoU=0.7	41.18	—	Unverified
7	Moment-DETR	R@1,IoU=0.7	33.02	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DeCafNet	R@1,IoU=0.1	13.25	—	Unverified
2	DenoiseLoc	R@1,IoU=0.1	11.59	—	Unverified