Referring Expression Segmentation
The task aims at labeling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an individual object in a discourse or scene (the referent). REs unambiguously identify the target instance.
Papers
Showing 1–10 of 145 papers
All datasetsRefCoCo valRefCOCO testARefer-YouTube-VOS (2021 public validation)RefCOCO+ test BA2D SentencesRefCOCOg-valJ-HMDBDAVIS 2017 (val)RefCOCOg-testRefCOCO testBPhraseCutRefCOCO
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MPG-SAM 2 | J&F | 73.9 | — | Unverified |
| 2 | VRS-HQ (Chat-UniVi-13B) | J&F | 71 | — | Unverified |
| 3 | GLEE-Pro | J&F | 70.6 | — | Unverified |
| 4 | UNINEXT-H | J&F | 70.1 | — | Unverified |
| 5 | ReferDINO (Swin-B) | J&F | 69.3 | — | Unverified |
| 6 | MUTR | J&F | 68.4 | — | Unverified |
| 7 | VLP (VLMo-L) | J&F | 67.6 | — | Unverified |
| 8 | UniRef-L (Swin-L) | J&F | 67.4 | — | Unverified |
| 9 | DsHmp (Video-Swin-Base) | J&F | 67.1 | — | Unverified |
| 10 | HTR (Pre-training) | J&F | 67.1 | — | Unverified |