Referring Expression Segmentation
The task aims at labeling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an individual object in a discourse or scene (the referent). REs unambiguously identify the target instance.
Papers
Showing 1–10 of 145 papers
All datasetsRefCoCo valRefCOCO testARefer-YouTube-VOS (2021 public validation)RefCOCO+ test BA2D SentencesRefCOCOg-valJ-HMDBDAVIS 2017 (val)RefCOCOg-testRefCOCO testBPhraseCutRefCOCO
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | DeRIS-L | Mean IoU | 81.32 | — | Unverified |
| 2 | UniLSeg-100 | Overall IoU | 80.54 | — | Unverified |
| 3 | MLCD-Seg-7B | Overall IoU | 80.5 | — | Unverified |
| 4 | UniLSeg-20 | Overall IoU | 79.47 | — | Unverified |
| 5 | HyperSeg | Overall IoU | 78.9 | — | Unverified |
| 6 | EVF-SAM | Overall IoU | 78.3 | — | Unverified |
| 7 | C3VG | Overall IoU | 76.39 | — | Unverified |
| 8 | DETRIS | Overall IoU | 75.3 | — | Unverified |
| 9 | GROUNDHOG | Overall IoU | 74.6 | — | Unverified |
| 10 | MaskRIS (Swin-B, combined DB) | Overall IoU | 71.09 | — | Unverified |