Visual Grounding

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or even a multi-round dialogue. There are three main challenges in VG:

What is the main focus in a query?
How to understand an image?
How to locate an object?

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 541–550 of 571 papers

Title	Date	Tasks	Status
Learning semantic sentence representations from visually grounded language without lexical knowledge	Mar 27, 2019	Grounded language learningLearning Semantic Representations	CodeCode Available
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment	Mar 27, 2019	Image RetrievalPhrase Grounding	—Unverified
Dual Attention Networks for Visual Reference Resolution in Visual Dialog	Feb 25, 2019	AI AgentQuestion Answering	CodeCode Available
You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding	Feb 12, 2019	object-detectionObject Detection	CodeCode Available
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded	Feb 11, 2019	Image CaptioningQuestion Answering	—Unverified
Learning to Assemble Neural Module Tree Networks for Visual Grounding	Dec 8, 2018	Dependency ParsingNatural Language Visual Grounding	—Unverified
Multi-task Learning of Hierarchical Vision-Language Representation	Dec 3, 2018	Multi-Task LearningQuestion Answering	—Unverified
Being data-driven is not enough: Revisiting interactive instruction giving as a challenge for NLG	Nov 1, 2018	Text GenerationVisual Grounding	—Unverified
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization	Oct 8, 2018	Question AnsweringVisual Grounding	—Unverified
Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat	Sep 10, 2018	Multi-Task LearningReinforcement Learning	CodeCode Available

Show:10 25 50

← PrevPage 55 of 58Next →

All datasets RefCOCO testA RefCOCO+ test B RefCoCo val

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	95.3	—	Unverified
2	mPLUG-2	Accuracy (%)	92.8	—	Unverified
3	X2-VLM (large)	Accuracy (%)	92.1	—	Unverified
4	XFM (base)	Accuracy (%)	90.4	—	Unverified
5	X2-VLM (base)	Accuracy (%)	90.3	—	Unverified
6	X-VLM (base)	Accuracy (%)	89	—	Unverified
7	HYDRA	IoU	61.7	—	Unverified
8	HYDRA	IoU	61.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	92	—	Unverified
2	mPLUG-2	Accuracy (%)	86.05	—	Unverified
3	X2-VLM (large)	Accuracy (%)	81.8	—	Unverified
4	XFM (base)	Accuracy (%)	79.8	—	Unverified
5	X2-VLM (base)	Accuracy (%)	78.4	—	Unverified
6	X-VLM (base)	Accuracy (%)	76.91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	93.4	—	Unverified
2	mPLUG-2	Accuracy (%)	90.33	—	Unverified
3	X2-VLM (large)	Accuracy (%)	87.6	—	Unverified
4	XFM (base)	Accuracy (%)	86.1	—	Unverified
5	X2-VLM (base)	Accuracy (%)	85.2	—	Unverified
6	X-VLM (base)	Accuracy (%)	84.51	—	Unverified