Video Grounding

Video grounding is the task of linking spoken language descriptions to specific video segments. In video grounding, the model is given a video and a natural language description, such as a sentence or a caption, and its goal is to identify the specific segment of the video that corresponds to the description. This can involve tasks such as localizing the objects or actions mentioned in the description within the video, or associating a specific time interval with the description.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–110 of 114 papers

Title	Date	Tasks	Status
Semi-Supervised Video Paragraph Grounding With Contrastive Encoder	Jan 1, 2022	SentenceVideo Grounding	—Unverified
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding	Nov 25, 2024	Dense Video CaptioningTransfer Learning	—Unverified
SimBase: A Simple Baseline for Temporal Video Grounding	Nov 12, 2024	Video Grounding	—Unverified
Simplify Implant Depth Prediction as Video Grounding: A Texture Perceive Implant Depth Prediction Network	Jun 7, 2024	Depth EstimationDepth Prediction	—Unverified
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability	Mar 18, 2025	Language ModelingLanguage Modelling	—Unverified
SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding	Apr 1, 2024	MambaState Space Models	—Unverified
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding	Jan 1, 2025	Action UnderstandingSpatio-Temporal Video Grounding	—Unverified
STVGBert: A Visual-Linguistic Transformer Based Framework for Spatio-Temporal Video Grounding	Jan 1, 2021	ObjectSentence	—Unverified
STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic Cross-Modal Understanding	Jul 6, 2022	Spatio-Temporal Video GroundingVideo Grounding	—Unverified
Support-Set Based Cross-Supervision for Video Grounding	Aug 24, 2021	Contrastive LearningVideo Grounding	—Unverified

Show:10 25 50

← PrevPage 11 of 12Next →

All datasets QVHighlights MAD

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	InternVideo2-6B	R@1,IoU=0.7	56.45	—	Unverified
2	InternVideo2-1B	R@1,IoU=0.7	54.45	—	Unverified
3	LLMEPET	R@1,IoU=0.7	49.94	—	Unverified
4	QD-DETR	R@1,IoU=0.7	44.98	—	Unverified
5	DiffusionVMR	R@1,IoU=0.7	44.49	—	Unverified
6	UMT	R@1,IoU=0.7	41.18	—	Unverified
7	Moment-DETR	R@1,IoU=0.7	33.02	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DeCafNet	R@1,IoU=0.1	13.25	—	Unverified
2	DenoiseLoc	R@1,IoU=0.1	11.59	—	Unverified