SOTAVerified

Visual Grounding

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or even a multi-round dialogue. There are three main challenges in VG:

  • What is the main focus in a query?
  • How to understand an image?
  • How to locate an object?

Papers

Showing 221230 of 571 papers

TitleStatusHype
Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining0
Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction0
Differentiable Parsing and Visual Grounding of Natural Language Instructions for Object Placement0
Interactive Reinforcement Learning for Object Grounding via Self-Talking0
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention0
Differentiable Disentanglement Filter: an Application Agnostic Core Concept Discovery Probe0
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment0
LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation0
Differentiable Disentanglement Filter: an Application Agnostic Core Concept Discovery Probe0
Benchmarking Diverse-Modal Entity Linking with Generative Models0
Show:102550
← PrevPage 23 of 58Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Florence-2-large-ftAccuracy (%)95.3Unverified
2mPLUG-2Accuracy (%)92.8Unverified
3X2-VLM (large)Accuracy (%)92.1Unverified
4XFM (base)Accuracy (%)90.4Unverified
5X2-VLM (base)Accuracy (%)90.3Unverified
6X-VLM (base)Accuracy (%)89Unverified
7HYDRAIoU61.7Unverified
8HYDRAIoU61.1Unverified
#ModelMetricClaimedVerifiedStatus
1Florence-2-large-ftAccuracy (%)92Unverified
2mPLUG-2Accuracy (%)86.05Unverified
3X2-VLM (large)Accuracy (%)81.8Unverified
4XFM (base)Accuracy (%)79.8Unverified
5X2-VLM (base)Accuracy (%)78.4Unverified
6X-VLM (base)Accuracy (%)76.91Unverified
#ModelMetricClaimedVerifiedStatus
1Florence-2-large-ftAccuracy (%)93.4Unverified
2mPLUG-2Accuracy (%)90.33Unverified
3X2-VLM (large)Accuracy (%)87.6Unverified
4XFM (base)Accuracy (%)86.1Unverified
5X2-VLM (base)Accuracy (%)85.2Unverified
6X-VLM (base)Accuracy (%)84.51Unverified