Visual Grounding

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or even a multi-round dialogue. There are three main challenges in VG:

What is the main focus in a query?
How to understand an image?
How to locate an object?

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 391–400 of 571 papers

Title	Date	Tasks	Status
Uni3DL: Unified Model for 3D and Language Understanding	Dec 5, 2023	Cross-Modal RetrievalInstance Segmentation	—Unverified
Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment	Dec 4, 2023	Grounded language learningLanguage Modeling	—Unverified
G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training	Dec 3, 2023	object-detectionObject Detection	CodeCode Available
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models	Dec 3, 2023	HallucinationVisual Grounding	CodeCode Available
Context-Aware Indoor Point Cloud Object Generation through User Instructions	Nov 26, 2023	PositionVisual Grounding	—Unverified
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models	Nov 21, 2023	Image SegmentationLanguage Modelling	CodeCode Available
A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis	Oct 31, 2023	DescriptiveMedical Image Analysis	—Unverified
GROOViST: A Metric for Grounding Objects in Visual Storytelling	Oct 26, 2023	Visual GroundingVisual Storytelling	CodeCode Available
Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching Network	Oct 25, 2023	Visual Grounding	CodeCode Available
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions	Oct 18, 2023	BenchmarkingVisual Grounding	CodeCode Available

Show:10 25 50

← PrevPage 40 of 58Next →

All datasets RefCOCO testA RefCOCO+ test B RefCoCo val

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	95.3	—	Unverified
2	mPLUG-2	Accuracy (%)	92.8	—	Unverified
3	X2-VLM (large)	Accuracy (%)	92.1	—	Unverified
4	XFM (base)	Accuracy (%)	90.4	—	Unverified
5	X2-VLM (base)	Accuracy (%)	90.3	—	Unverified
6	X-VLM (base)	Accuracy (%)	89	—	Unverified
7	HYDRA	IoU	61.7	—	Unverified
8	HYDRA	IoU	61.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	92	—	Unverified
2	mPLUG-2	Accuracy (%)	86.05	—	Unverified
3	X2-VLM (large)	Accuracy (%)	81.8	—	Unverified
4	XFM (base)	Accuracy (%)	79.8	—	Unverified
5	X2-VLM (base)	Accuracy (%)	78.4	—	Unverified
6	X-VLM (base)	Accuracy (%)	76.91	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Florence-2-large-ft	Accuracy (%)	93.4	—	Unverified
2	mPLUG-2	Accuracy (%)	90.33	—	Unverified
3	X2-VLM (large)	Accuracy (%)	87.6	—	Unverified
4	XFM (base)	Accuracy (%)	86.1	—	Unverified
5	X2-VLM (base)	Accuracy (%)	85.2	—	Unverified
6	X-VLM (base)	Accuracy (%)	84.51	—	Unverified