SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 2650 of 364 papers

TitleStatusHype
Cognitive Disentanglement for Referring Multi-Object Tracking0
GroundingSuite: Measuring Complex Multi-Granular Pixel GroundingCode2
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word EmphasisCode1
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM CollaborationCode1
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?Code1
Exploring Spatial Language Grounding Through Referring Expressions0
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone ScenesCode1
NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic ReasoningCode1
Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities0
FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis0
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks0
Multi-task Visual Grounding with Coarse-to-Fine Consistency ConstraintsCode1
IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression SegmentationCode1
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension0
Exploring Contextual Attribute Density in Referring Expression CountingCode1
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension0
Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding0
Towards Visual Grounding: A SurveyCode3
RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression SegmentationCode1
Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension0
Instance-Aware Generalized Referring Expression Segmentation0
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation0
SegLLM: Multi-round Reasoning Segmentation0
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal ModelsCode0
Text4Seg: Reimagining Image Segmentation as Text GenerationCode2
Show:102550
← PrevPage 2 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified