SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 2650 of 364 papers

TitleStatusHype
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?Code1
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone ScenesCode1
NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic ReasoningCode1
Multi-task Visual Grounding with Coarse-to-Fine Consistency ConstraintsCode1
IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression SegmentationCode1
Exploring Contextual Attribute Density in Referring Expression CountingCode1
RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression SegmentationCode1
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image SegmentationCode1
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoECode1
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression ComprehensionCode1
Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image SegmentationCode1
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression ComprehensionCode1
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression ComprehensionCode1
3D-GRES: Generalized 3D Referring Expression SegmentationCode1
Multi-branch Collaborative Learning Network for 3D Visual GroundingCode1
Referring Atomic Video Action RecognitionCode1
SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression SegmentationCode1
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression SegmentationCode1
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression ComprehensionCode1
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLMCode1
Multi-modal Instruction Tuned LLMs with Fine-grained Visual PerceptionCode1
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity RecognitionCode1
An Open and Comprehensive Pipeline for Unified Object Grounding and DetectionCode1
Referring Expression CountingCode1
Tune-An-Ellipse: CLIP Has Potential to Find What You WantCode1
Show:102550
← PrevPage 2 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified