Phrase Grounding
Given an image and a corresponding caption, the Phrase Grounding task aims to ground each entity mentioned by a noun phrase in the caption to a region in the image.
Source: Phrase Grounding by Soft-Label Chain Conditional Random Field
Papers
Showing 1–10 of 88 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GLIPv2 | R@1 | 87.7 | — | Unverified |
| 2 | FIBER-B | R@1 | 87.4 | — | Unverified |
| 3 | GLIP | R@1 | 87.1 | — | Unverified |
| 4 | PEVL | R@1 | 84.4 | — | Unverified |
| 5 | MDETR-ENB5 | R@1 | 84.3 | — | Unverified |
| 6 | DIGN | R@1 | 78.73 | — | Unverified |
| 7 | LCMCG | R@1 | 76.74 | — | Unverified |
| 8 | Soft-Label Chain CRF (SL-CCRF) | R@1 | 74.69 | — | Unverified |
| 9 | DDPN (ResNet-101) | R@1 | 73.3 | — | Unverified |
| 10 | VisualBERT | R@1 | 71.33 | — | Unverified |