SOTAVerified

Phrase Grounding

Given an image and a corresponding caption, the Phrase Grounding task aims to ground each entity mentioned by a noun phrase in the caption to a region in the image.

Source: Phrase Grounding by Soft-Label Chain Conditional Random Field

Papers

Showing 5175 of 88 papers

TitleStatusHype
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language0
Trade-offs in Fine-tuned Diffusion Models Between Accuracy and InterpretabilityCode0
LIMITR: Leveraging Local Information for Medical Image-Text Representation0
Investigating the Role of Attribute Context in Vision-Language Models for Object Recognition and Detection0
Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment0
Learning to Exploit Temporal Structure for Biomedical Vision-Language ProcessingCode0
Similarity Maps for Self-Training Weakly-Supervised Phrase GroundingCode0
Extending Phrase Grounding with Pronouns in Visual DialoguesCode0
Detailed Annotations of Chest X-Rays via CT Projection for Report Understanding0
Making the Most of Text Semantics to Improve Biomedical Vision--Language ProcessingCode0
Lite-MDETR: A Lightweight Multi-Modal Detector0
Improving Pre-trained Vision-and-Language Embeddings for Phrase Grounding0
Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling0
Grounding Plural Phrases: Countering Evaluation Biases by Individuation0
Detector-Free Weakly Supervised Grounding by SeparationCode0
Disentangled Motif-aware Graph Learning for Phrase Grounding0
Utilizing Every Image Object for Semi-supervised Phrase Grounding0
Learning to ground medical text in a 3D human atlasCode0
Propagating Over Phrase Relations for One-Stage Visual Grounding0
Neural Parameter Allocation SearchCode0
Phrase Grounding by Soft-Label Chain Conditional Random FieldCode0
Zero-Shot Grounding of Objects from Natural Language QueriesCode0
Language Features Matter: Effective Language Representations for Vision-Language Tasks0
Modularized Textual Grounding for Counterfactual ResilienceCode0
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment0
Show:102550
← PrevPage 3 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GLIPv2R@187.7Unverified
2FIBER-BR@187.4Unverified
3GLIPR@187.1Unverified
4PEVLR@184.4Unverified
5MDETR-ENB5R@184.3Unverified
6DIGNR@178.73Unverified
7LCMCGR@176.74Unverified
8Soft-Label Chain CRF (SL-CCRF)R@174.69Unverified
9DDPN (ResNet-101)R@173.3Unverified
10VisualBERTR@171.33Unverified
#ModelMetricClaimedVerifiedStatus
1GBS Ensemble + 12-in-1Pointing Game Accuracy85.9Unverified
2GbS Ensemble MS-COCOPointing Game Accuracy75.6Unverified
3COCO_ELMo_PNASNetPointing Game Accuracy69.19Unverified
#ModelMetricClaimedVerifiedStatus
1Fiber-BR@187.1Unverified
2PEVLR@184.1Unverified
3VisualBERTR@170.4Unverified
#ModelMetricClaimedVerifiedStatus
1VG_BiLSTM_VGGPointing Game Accuracy62.76Unverified
2GbS Ensemble MS-COCOPointing Game Accuracy58.21Unverified
3MCBAccuracy28.91Unverified
#ModelMetricClaimedVerifiedStatus
1GbS VGPointing Game Accuracy55.91Unverified
2VG_ELMo_PNASNetPointing Game Accuracy55.16Unverified
3GbS Ensemble MS-COCOPointing Game Accuracy54.55Unverified