Visual Grounding

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or even a multi-round dialogue. There are three main challenges in VG:

What is the main focus in a query?
How to understand an image?
How to locate an object?

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 411–420 of 571 papers

Title	Date	Tasks	Status	Hype
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer	Jun 14, 2022	Visual Grounding	CodeCode Available	1
Referring Image Matting	Jun 10, 2022	Domain GeneralizationImage Matting	CodeCode Available	2
Guiding Visual Question Answering with Attention Priors	May 25, 2022	Question AnsweringVisual Grounding	—Unverified	0
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections	May 24, 2022	Computational Efficiencycross-modal alignment	CodeCode Available	1
Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity Resolution	May 24, 2022	Domain AdaptationVisual Grounding	—Unverified	0
Weakly-supervised segmentation of referring expressions	May 10, 2022	Image SegmentationReferring Expression	—Unverified	0
RoViST:Learning Robust Metrics for Visual Storytelling	May 8, 2022	SentenceText Generation	CodeCode Available	0
Flexible Visual Grounding	May 1, 2022	ArticlesVisual Grounding	CodeCode Available	0
To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo	May 1, 2022	BenchmarkingPerson-centric Visual Grounding	CodeCode Available	0
Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer	May 1, 2022	Text GenerationVisual Grounding	CodeCode Available	0

Show:10 25 50

← PrevPage 42 of 58Next →

All datasets RefCOCO testA RefCOCO+ test B RefCoCo val

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	95.3	—	Unverified
2	mPLUG-2	Accuracy (%)	92.8	—	Unverified
3	X2-VLM (large)	Accuracy (%)	92.1	—	Unverified
4	XFM (base)	Accuracy (%)	90.4	—	Unverified
5	X2-VLM (base)	Accuracy (%)	90.3	—	Unverified
6	X-VLM (base)	Accuracy (%)	89	—	Unverified
7	HYDRA	IoU	61.7	—	Unverified
8	HYDRA	IoU	61.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	92	—	Unverified
2	mPLUG-2	Accuracy (%)	86.05	—	Unverified
3	X2-VLM (large)	Accuracy (%)	81.8	—	Unverified
4	XFM (base)	Accuracy (%)	79.8	—	Unverified
5	X2-VLM (base)	Accuracy (%)	78.4	—	Unverified
6	X-VLM (base)	Accuracy (%)	76.91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	93.4	—	Unverified
2	mPLUG-2	Accuracy (%)	90.33	—	Unverified
3	X2-VLM (large)	Accuracy (%)	87.6	—	Unverified
4	XFM (base)	Accuracy (%)	86.1	—	Unverified
5	X2-VLM (base)	Accuracy (%)	85.2	—	Unverified
6	X-VLM (base)	Accuracy (%)	84.51	—	Unverified