Visual Grounding

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or even a multi-round dialogue. There are three main challenges in VG:

What is the main focus in a query?
How to understand an image?
How to locate an object?

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–310 of 571 papers

Title	Date	Tasks	Status	Hype
Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching Network	Oct 25, 2023	Visual Grounding	CodeCode Available	0
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding	Oct 22, 2023	Novel Conceptsobject-detection	CodeCode Available	1
Visual Grounding Helps Learn Word Meanings in Low-Data Regimes	Oct 20, 2023	Image CaptioningLanguage Acquisition	CodeCode Available	1
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions	Oct 18, 2023	BenchmarkingVisual Grounding	CodeCode Available	0
NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning	Oct 17, 2023	SegmentationVisual Grounding	CodeCode Available	0
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V	Oct 17, 2023	Interactive SegmentationReferring Expression	CodeCode Available	4
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning	Oct 14, 2023	Image ClassificationImage Description	CodeCode Available	7
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models	Oct 13, 2023	HallucinationImage Captioning	CodeCode Available	2
CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding	Oct 10, 2023	3D visual groundingVisual Grounding	CodeCode Available	1
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models	Oct 9, 2023	Language ModellingQuestion Answering	CodeCode Available	1

Show:10 25 50

← PrevPage 31 of 58Next →

All datasets RefCOCO testA RefCOCO+ test B RefCoCo val

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	95.3	—	Unverified
2	mPLUG-2	Accuracy (%)	92.8	—	Unverified
3	X2-VLM (large)	Accuracy (%)	92.1	—	Unverified
4	XFM (base)	Accuracy (%)	90.4	—	Unverified
5	X2-VLM (base)	Accuracy (%)	90.3	—	Unverified
6	X-VLM (base)	Accuracy (%)	89	—	Unverified
7	HYDRA	IoU	61.7	—	Unverified
8	HYDRA	IoU	61.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	92	—	Unverified
2	mPLUG-2	Accuracy (%)	86.05	—	Unverified
3	X2-VLM (large)	Accuracy (%)	81.8	—	Unverified
4	XFM (base)	Accuracy (%)	79.8	—	Unverified
5	X2-VLM (base)	Accuracy (%)	78.4	—	Unverified
6	X-VLM (base)	Accuracy (%)	76.91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	93.4	—	Unverified
2	mPLUG-2	Accuracy (%)	90.33	—	Unverified
3	X2-VLM (large)	Accuracy (%)	87.6	—	Unverified
4	XFM (base)	Accuracy (%)	86.1	—	Unverified
5	X2-VLM (base)	Accuracy (%)	85.2	—	Unverified
6	X-VLM (base)	Accuracy (%)	84.51	—	Unverified