Visual Grounding

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or even a multi-round dialogue. There are three main challenges in VG:

What is the main focus in a query?
How to understand an image?
How to locate an object?

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 311–320 of 571 papers

Title	Date	Tasks	Status
Joint Top-Down and Bottom-Up Frameworks for 3D Visual Grounding	Oct 21, 2024	3D visual groundingObject	—Unverified
Knowledge Supports Visual Language Grounding: A Case Study on Colour Terms	Jul 1, 2020	DiagnosticObject	—Unverified
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving	May 25, 2023	3D Object DetectionAutonomous Driving	—Unverified
Language learning using Speech to Image retrieval	Sep 9, 2019	Grounded language learningImage Retrieval	—Unverified
LanguageRefer: Spatial-Language Model for 3D Visual Grounding	Jul 7, 2021	3D visual groundingLanguage Modeling	—Unverified
LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering	Jan 29, 2024	Language ModelingLanguage Modelling	—Unverified
Learning from Synthetic Data for Visual Grounding	Mar 20, 2024	Language ModellingLarge Language Model	—Unverified
Visually Consistent Hierarchical Image Classification	Jun 17, 2024	Classificationimage-classification	—Unverified
Learning Language Structures through Grounding	Jun 14, 2024	Automatic Speech RecognitionDependency Parsing	—Unverified
Learning to Compose and Reason with Language Tree Structures for Visual Grounding	Jun 5, 2019	Visual GroundingVisual Reasoning	—Unverified

Show:10 25 50

← PrevPage 32 of 58Next →

All datasets RefCOCO testA RefCOCO+ test B RefCoCo val

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	95.3	—	Unverified
2	mPLUG-2	Accuracy (%)	92.8	—	Unverified
3	X2-VLM (large)	Accuracy (%)	92.1	—	Unverified
4	XFM (base)	Accuracy (%)	90.4	—	Unverified
5	X2-VLM (base)	Accuracy (%)	90.3	—	Unverified
6	X-VLM (base)	Accuracy (%)	89	—	Unverified
7	HYDRA	IoU	61.7	—	Unverified
8	HYDRA	IoU	61.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	92	—	Unverified
2	mPLUG-2	Accuracy (%)	86.05	—	Unverified
3	X2-VLM (large)	Accuracy (%)	81.8	—	Unverified
4	XFM (base)	Accuracy (%)	79.8	—	Unverified
5	X2-VLM (base)	Accuracy (%)	78.4	—	Unverified
6	X-VLM (base)	Accuracy (%)	76.91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	93.4	—	Unverified
2	mPLUG-2	Accuracy (%)	90.33	—	Unverified
3	X2-VLM (large)	Accuracy (%)	87.6	—	Unverified
4	XFM (base)	Accuracy (%)	86.1	—	Unverified
5	X2-VLM (base)	Accuracy (%)	85.2	—	Unverified
6	X-VLM (base)	Accuracy (%)	84.51	—	Unverified