Visual Grounding

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or even a multi-round dialogue. There are three main challenges in VG:

What is the main focus in a query?
How to understand an image?
How to locate an object?

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 501–510 of 571 papers

Title	Date	Tasks	Status
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach	Feb 3, 2021	Question AnsweringVisual Grounding	CodeCode Available
Transformers in Vision: A Survey	Jan 4, 2021	Action RecognitionActivity Recognition	—Unverified
3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds	Jan 1, 2021	ObjectObject Proposal Generation	—Unverified
Explainable Video Entailment With Grounded Visual Evidence	Jan 1, 2021	Visual Grounding	—Unverified
CASTing Your Model: Learning to Localize Improves Self-Supervised Representations	Dec 8, 2020	Self-Supervised LearningVisual Grounding	—Unverified
Class-agnostic Object Detection	Nov 28, 2020	BenchmarkingClass-agnostic Object Detection	—Unverified
Learning to ground medical text in a 3D human atlas	Nov 1, 2020	Phrase GroundingVisual Grounding	CodeCode Available
SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency	Oct 20, 2020	Question AnsweringVisual Grounding	CodeCode Available
Neural Twins Talk	Sep 26, 2020	Image CaptioningSentence	CodeCode Available
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary	Sep 18, 2020	Autonomous VehiclesReferring Expression Comprehension	—Unverified

Show:10 25 50

← PrevPage 51 of 58Next →

All datasets RefCOCO testA RefCOCO+ test B RefCoCo val

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	95.3	—	Unverified
2	mPLUG-2	Accuracy (%)	92.8	—	Unverified
3	X2-VLM (large)	Accuracy (%)	92.1	—	Unverified
4	XFM (base)	Accuracy (%)	90.4	—	Unverified
5	X2-VLM (base)	Accuracy (%)	90.3	—	Unverified
6	X-VLM (base)	Accuracy (%)	89	—	Unverified
7	HYDRA	IoU	61.7	—	Unverified
8	HYDRA	IoU	61.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	92	—	Unverified
2	mPLUG-2	Accuracy (%)	86.05	—	Unverified
3	X2-VLM (large)	Accuracy (%)	81.8	—	Unverified
4	XFM (base)	Accuracy (%)	79.8	—	Unverified
5	X2-VLM (base)	Accuracy (%)	78.4	—	Unverified
6	X-VLM (base)	Accuracy (%)	76.91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	93.4	—	Unverified
2	mPLUG-2	Accuracy (%)	90.33	—	Unverified
3	X2-VLM (large)	Accuracy (%)	87.6	—	Unverified
4	XFM (base)	Accuracy (%)	86.1	—	Unverified
5	X2-VLM (base)	Accuracy (%)	85.2	—	Unverified
6	X-VLM (base)	Accuracy (%)	84.51	—	Unverified