Visual Grounding

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or even a multi-round dialogue. There are three main challenges in VG:

What is the main focus in a query?
How to understand an image?
How to locate an object?

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–460 of 571 papers

Title	Date	Tasks	Status
Visual Grounding via Accumulated Attention	Jun 1, 2018	SentenceVisual Grounding	—Unverified
Visual Grounding with Attention-Driven Constraint Balancing	Jul 3, 2024	Objectobject-detection	—Unverified
Visual Intention Grounding for Egocentric Assistants	Apr 18, 2025	ObjectVisual Grounding	—Unverified
Visually grounded cross-lingual keyword spotting in speech	Jun 13, 2018	Keyword SpottingVisual Grounding	—Unverified
Visually Grounded Neural Syntax Acquisition	Jun 7, 2019	Visual Grounding	—Unverified
Visual Prompting in Multimodal Large Language Models: A Survey	Sep 5, 2024	In-Context LearningPrompt Learning	—Unverified
Visual Reference Resolution using Attention Memory for Visual Dialog	Sep 23, 2017	Parameter PredictionQuestion Answering	—Unverified
VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation	Jul 9, 2025	Backdoor AttackVisual Grounding	—Unverified
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks	Oct 7, 2024	Information RetrievalLanguage Modeling	—Unverified
VLMAE: Vision-Language Masked Autoencoder	Aug 19, 2022	Image-text RetrievalLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 46 of 58Next →

All datasets RefCOCO testA RefCOCO+ test B RefCoCo val

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	95.3	—	Unverified
2	mPLUG-2	Accuracy (%)	92.8	—	Unverified
3	X2-VLM (large)	Accuracy (%)	92.1	—	Unverified
4	XFM (base)	Accuracy (%)	90.4	—	Unverified
5	X2-VLM (base)	Accuracy (%)	90.3	—	Unverified
6	X-VLM (base)	Accuracy (%)	89	—	Unverified
7	HYDRA	IoU	61.7	—	Unverified
8	HYDRA	IoU	61.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	92	—	Unverified
2	mPLUG-2	Accuracy (%)	86.05	—	Unverified
3	X2-VLM (large)	Accuracy (%)	81.8	—	Unverified
4	XFM (base)	Accuracy (%)	79.8	—	Unverified
5	X2-VLM (base)	Accuracy (%)	78.4	—	Unverified
6	X-VLM (base)	Accuracy (%)	76.91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	93.4	—	Unverified
2	mPLUG-2	Accuracy (%)	90.33	—	Unverified
3	X2-VLM (large)	Accuracy (%)	87.6	—	Unverified
4	XFM (base)	Accuracy (%)	86.1	—	Unverified
5	X2-VLM (base)	Accuracy (%)	85.2	—	Unverified
6	X-VLM (base)	Accuracy (%)	84.51	—	Unverified