SOTAVerified

Visual Grounding

Visual Grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. The query can be a phrase, a sentence, or even a multi-round dialogue. There are three main challenges in VG:

  • What is the main focus in a query?
  • How to understand an image?
  • How to locate an object?

Papers

Showing 281290 of 571 papers

TitleStatusHype
Improved Visual Grounding through Self-Consistent Explanations0
GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language ModelsCode1
Mismatch Quest: Visual and Textual Feedback for Image-Text MisalignmentCode0
Uni3DL: Unified Model for 3D and Language Understanding0
Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment0
Aligning and Prompting Everything All at Once for Universal Visual PerceptionCode2
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language ModelsCode0
G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-trainingCode0
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsCode1
Context-Aware Indoor Point Cloud Object Generation through User Instructions0
Show:102550
← PrevPage 29 of 58Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Florence-2-large-ftAccuracy (%)95.3Unverified
2mPLUG-2Accuracy (%)92.8Unverified
3X2-VLM (large)Accuracy (%)92.1Unverified
4XFM (base)Accuracy (%)90.4Unverified
5X2-VLM (base)Accuracy (%)90.3Unverified
6X-VLM (base)Accuracy (%)89Unverified
7HYDRAIoU61.7Unverified
8HYDRAIoU61.1Unverified
#ModelMetricClaimedVerifiedStatus
1Florence-2-large-ftAccuracy (%)92Unverified
2mPLUG-2Accuracy (%)86.05Unverified
3X2-VLM (large)Accuracy (%)81.8Unverified
4XFM (base)Accuracy (%)79.8Unverified
5X2-VLM (base)Accuracy (%)78.4Unverified
6X-VLM (base)Accuracy (%)76.91Unverified
#ModelMetricClaimedVerifiedStatus
1Florence-2-large-ftAccuracy (%)93.4Unverified
2mPLUG-2Accuracy (%)90.33Unverified
3X2-VLM (large)Accuracy (%)87.6Unverified
4XFM (base)Accuracy (%)86.1Unverified
5X2-VLM (base)Accuracy (%)85.2Unverified
6X-VLM (base)Accuracy (%)84.51Unverified