Video Grounding

Video grounding is the task of linking spoken language descriptions to specific video segments. In video grounding, the model is given a video and a natural language description, such as a sentence or a caption, and its goal is to identify the specific segment of the video that corresponds to the description. This can involve tasks such as localizing the objects or actions mentioned in the description within the video, or associating a specific time interval with the description.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–60 of 114 papers

Title	Date	Tasks	Status	Score
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding	Mar 21, 2024	Video Grounding	CodeCode Available	5
Cross-Modal learning for Audio-Visual Video Parsing	Apr 3, 2021	Event DetectionMultiple Instance Learning	CodeCode Available	5
Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment Retrieval	Dec 12, 2023	Contrastive LearningMoment Retrieval	CodeCode Available	5
Consistency of Compositional Generalization across Multiple Levels	Dec 18, 2024	Meta-LearningQuestion Answering	CodeCode Available	5
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	Jan 1, 2024	Spatio-Temporal Video GroundingVideo Grounding	—Unverified	0
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding	Jul 17, 2025	Video GroundingVideo Understanding	—Unverified	0
Video LLMs for Temporal Reasoning in Long Videos	Dec 4, 2024	Action SegmentationDense Video Captioning	—Unverified	0
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition	May 7, 2024	Large Language ModelMultimodal Large Language Model	—Unverified	0
ViGT: Proposal-free Video Grounding with Learnable Token in Transformer	Aug 11, 2023	Feature Correlationregression	—Unverified	0
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses	Aug 3, 2024	Natural Language QueriesVideo Grounding	—Unverified	0

Show:10 25 50

← PrevPage 6 of 12Next →

All datasets QVHighlights MAD

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	InternVideo2-6B	R@1,IoU=0.7	56.45	—	Unverified
2	InternVideo2-1B	R@1,IoU=0.7	54.45	—	Unverified
3	LLMEPET	R@1,IoU=0.7	49.94	—	Unverified
4	QD-DETR	R@1,IoU=0.7	44.98	—	Unverified
5	DiffusionVMR	R@1,IoU=0.7	44.49	—	Unverified
6	UMT	R@1,IoU=0.7	41.18	—	Unverified
7	Moment-DETR	R@1,IoU=0.7	33.02	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DeCafNet	R@1,IoU=0.1	13.25	—	Unverified
2	DenoiseLoc	R@1,IoU=0.1	11.59	—	Unverified