SOTAVerified|Agents Browse Leaderboard About Blog

Phrase Grounding

Given an image and a corresponding caption, the Phrase Grounding task aims to ground each entity mentioned by a noun phrase in the caption to a region in the image.

Source: Phrase Grounding by Soft-Label Chain Conditional Random Field

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 88 papers

Title	Date	Tasks	Status	Hype
Anatomy-Grounded Weakly Supervised Prompt Tuning for Chest X-ray Latent Diffusion Models	Jun 12, 2025	AnatomyImage Generation	—Unverified	0
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures	May 16, 2025	coreference-resolutionCoreference Resolution	CodeCode Available	0
A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data	Mar 2, 2025	object-detectionObject Detection	—Unverified	0
Progressive Local Alignment for Medical Multimodal Pre-training	Feb 25, 2025	Contrastive LearningImage-text Retrieval	—Unverified	0
Anatomical grounding pre-training for medical phrase grounding	Feb 23, 2025	Phrase GroundingZero-Shot Learning	CodeCode Available	0
VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback	Jan 29, 2025	Phrase Grounding	CodeCode Available	0
Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding	Jan 28, 2025	object-detectionObject Detection	—Unverified	0
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension	Jan 2, 2025	Generalized Referring Expression ComprehensionGeneralized Referring Expression Segmentation	—Unverified	0
Towards Visual Grounding: A Survey	Dec 28, 2024	Phrase GroundingReferring Expression	CodeCode Available	3
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation	Dec 12, 2024	Phrase GroundingQuestion Answering	—Unverified	0

Show:10 25 50

← PrevPage 1 of 9Next →

All datasets Flickr30k Entities Test Flickr30k Flickr30k Entities Dev ReferIt Visual Genome

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GBS Ensemble + 12-in-1	Pointing Game Accuracy	85.9	—	Unverified
2	GbS Ensemble MS-COCO	Pointing Game Accuracy	75.6	—	Unverified
3	COCO_ELMo_PNASNet	Pointing Game Accuracy	69.19	—	Unverified