SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 101125 of 364 papers

TitleStatusHype
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression SegmentationCode2
Viewpoint-Aware Visual Grounding in 3D Scenes0
Referring Expression CountingCode1
Tune-An-Ellipse: CLIP Has Potential to Find What You WantCode1
Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction0
GSVA: Generalized Segmentation via Multimodal Large Language ModelsCode1
Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression SegmentationCode1
Localized Symbolic Knowledge Distillation for Visual Commonsense ModelsCode0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection0
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence GenerationCode0
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsCode1
Continual Referring Expression Comprehension via Dual Modular MemorizationCode0
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models0
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language ModelsCode0
NExT-Chat: An LMM for Chat, Detection and SegmentationCode2
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEsCode1
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding0
GLaMM: Pixel Grounding Large Multimodal ModelCode2
Towards Omni-supervised Referring Expression SegmentationCode0
Text Augmented Spatial-aware Zero-shot Referring Image Segmentation0
Video Referring Expression Comprehension via Transformer with Content-conditioned Query0
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4VCode4
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMsCode1
Multi-modal Domain Adaptation for REG via Relation Transfer0
CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation0
Show:102550
← PrevPage 5 of 15Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified