SOTAVerified

Generalized Referring Expression Comprehension

Generalized Referring Expression Comprehension (GREC) allows expressions indicating any number of target objects. GREC takes an image and a referring expression as input, and requires bounding box(es) prediction of the target object(s).

Papers

Showing 17 of 7 papers

TitleStatusHype
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension0
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal FusionCode2
GREC: Generalized Referring Expression ComprehensionCode2
Universal Instance Perception as Object Discovery and RetrievalCode3
Vision-Language Transformer and Query Generation for Referring SegmentationCode1
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingCode1
Multi-task Collaborative Network for Joint Referring Expression Comprehension and SegmentationCode1
Show:102550

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SimVG-DBPrecision@(F1=1, IoU≥0.5)62.1Unverified
2UNINEXTPrecision@(F1=1, IoU≥0.5)58.2Unverified
3MDETRPrecision@(F1=1, IoU≥0.5)41.5Unverified
4VLTPrecision@(F1=1, IoU≥0.5)36.6Unverified
5MCNPrecision@(F1=1, IoU≥0.5)28Unverified