Spatio-Temporal Video Grounding

Spatio-temporal video grounding is a computer vision and natural language processing (NLP) task that involves linking textual descriptions to specific spatio-temporal regions or moments in a video. In other words, it aims to determine which parts of a video correspond to a given textual query or description. This task is essential for various applications, including video summarization, content-based video retrieval, video captioning, and more.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 22 papers

Title	Date	Tasks	Status	Hype
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability	Mar 18, 2025	Language ModelingLanguage Modelling	—Unverified	0
Large-scale Pre-training for Grounded Video Caption Generation	Mar 13, 2025	Caption Generation	CodeCode Available	1
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding	Feb 16, 2025	AttributeObject	CodeCode Available	1
Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding	Jan 28, 2025	object-detectionObject Detection	—Unverified	0
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding	Jan 1, 2025	Action UnderstandingSpatio-Temporal Video Grounding	—Unverified	0
Context-Guided Spatio-Temporal Video Grounding	Jan 3, 2024	ObjectSpatio-Temporal Video Grounding	CodeCode Available	2
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	Jan 1, 2024	Spatio-Temporal Video GroundingVideo Grounding	—Unverified	0
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	Dec 31, 2023	Spatio-Temporal Video GroundingVideo Grounding	—Unverified	0
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models	Nov 22, 2023	BenchmarkingPhrase Grounding	CodeCode Available	2
Guided Attention for Interpretable Motion Captioning	Oct 11, 2023	Action LocalizationMotion Captioning	CodeCode Available	0

Show:10 25 50

← PrevPage 1 of 3Next →

All datasets HC-STVG2 HC-STVG1 VidSTG

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	TA-STVG	Val m_vIoU	40.2	—	Unverified
2	CG-STVG	Val m_vIoU	39.5	—	Unverified
3	STVGFormer	Val m_vIoU	38.7	—	Unverified
4	TubeDETR	Val m_vIoU	36.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TA-STVG	m_vIoU	39.1	—	Unverified
2	CG-STVG	m_vIoU	38.4	—	Unverified
3	TubeDETR	m_vIoU	32.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TA-STVG	Declarative m_vIoU	34.4	—	Unverified
2	CG-STVG	Declarative m_vIoU	34	—	Unverified
3	TubeDETR	Declarative m_vIoU	30.4	—	Unverified