Visual Grounding

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or even a multi-round dialogue. There are three main challenges in VG:

What is the main focus in a query?
How to understand an image?
How to locate an object?

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 211–220 of 571 papers

Title	Date	Tasks	Status	Hype
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model	Jun 1, 2024	Action RecognitionActivity Recognition	—Unverified	0
Instruction-Guided Visual Masking	May 30, 2024	Instruction FollowingVisual Grounding	CodeCode Available	1
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention	May 28, 2024	3D Object Detection3D visual grounding	—Unverified	0
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding	May 27, 2024	Visual Grounding	—Unverified	0
Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding	May 24, 2024	3D visual groundingAutonomous Driving	—Unverified	0
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension	May 21, 2024	3D visual groundingReferring Expression	CodeCode Available	1
Adversarial Robustness for Visual Grounding of Multimodal Large Language Models	May 16, 2024	Adversarial AttackAdversarial Robustness	CodeCode Available	0
DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding	May 10, 2024	RelationSpatial Reasoning	CodeCode Available	1
Visual grounding for desktop graphical user interfaces	May 5, 2024	Language ModelingLanguage Modelling	—Unverified	0
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners	Apr 30, 2024	3D visual groundingVisual Grounding	—Unverified	0

Show:10 25 50

← PrevPage 22 of 58Next →

All datasets RefCOCO testA RefCOCO+ test B RefCoCo val

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	95.3	—	Unverified
2	mPLUG-2	Accuracy (%)	92.8	—	Unverified
3	X2-VLM (large)	Accuracy (%)	92.1	—	Unverified
4	XFM (base)	Accuracy (%)	90.4	—	Unverified
5	X2-VLM (base)	Accuracy (%)	90.3	—	Unverified
6	X-VLM (base)	Accuracy (%)	89	—	Unverified
7	HYDRA	IoU	61.7	—	Unverified
8	HYDRA	IoU	61.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	92	—	Unverified
2	mPLUG-2	Accuracy (%)	86.05	—	Unverified
3	X2-VLM (large)	Accuracy (%)	81.8	—	Unverified
4	XFM (base)	Accuracy (%)	79.8	—	Unverified
5	X2-VLM (base)	Accuracy (%)	78.4	—	Unverified
6	X-VLM (base)	Accuracy (%)	76.91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	93.4	—	Unverified
2	mPLUG-2	Accuracy (%)	90.33	—	Unverified
3	X2-VLM (large)	Accuracy (%)	87.6	—	Unverified
4	XFM (base)	Accuracy (%)	86.1	—	Unverified
5	X2-VLM (base)	Accuracy (%)	85.2	—	Unverified
6	X-VLM (base)	Accuracy (%)	84.51	—	Unverified