SOTAVerified

Phrase Grounding

Given an image and a corresponding caption, the Phrase Grounding task aims to ground each entity mentioned by a noun phrase in the caption to a region in the image.

Source: Phrase Grounding by Soft-Label Chain Conditional Random Field

Papers

Showing 110 of 88 papers

TitleStatusHype
Anatomy-Grounded Weakly Supervised Prompt Tuning for Chest X-ray Latent Diffusion Models0
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic StructuresCode0
A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data0
Progressive Local Alignment for Medical Multimodal Pre-training0
Anatomical grounding pre-training for medical phrase groundingCode0
VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human FeedbackCode0
Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding0
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension0
Towards Visual Grounding: A SurveyCode3
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation0
Show:102550
← PrevPage 1 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Fiber-BR@187.1Unverified
2PEVLR@184.1Unverified
3VisualBERTR@170.4Unverified