SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 101150 of 364 papers

TitleStatusHype
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression SegmentationCode2
Viewpoint-Aware Visual Grounding in 3D Scenes0
Referring Expression CountingCode1
Tune-An-Ellipse: CLIP Has Potential to Find What You WantCode1
Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction0
GSVA: Generalized Segmentation via Multimodal Large Language ModelsCode1
Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression SegmentationCode1
Localized Symbolic Knowledge Distillation for Visual Commonsense ModelsCode0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection0
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence GenerationCode0
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsCode1
Continual Referring Expression Comprehension via Dual Modular MemorizationCode0
Griffon: Spelling out All Object Locations at Any Granularity with Large Language ModelsCode0
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language ModelsCode0
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEsCode1
NExT-Chat: An LMM for Chat, Detection and SegmentationCode2
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding0
GLaMM: Pixel Grounding Large Multimodal ModelCode2
Towards Omni-supervised Referring Expression SegmentationCode0
Text Augmented Spatial-aware Zero-shot Referring Image Segmentation0
Video Referring Expression Comprehension via Transformer with Content-conditioned Query0
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4VCode4
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMsCode1
Multi-modal Domain Adaptation for REG via Relation Transfer0
CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation0
Collecting Visually-Grounded Dialogue with A Game Of SortsCode0
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression SegmentationCode1
GREC: Generalized Referring Expression ComprehensionCode2
A Unified Framework for 3D Point Cloud Visual GroundingCode1
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4DCode1
March in Chat: Interactive Prompting for Remote Embodied Referring ExpressionCode1
Whether you can locate or not? Interactive Referring Expression GenerationCode0
'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational ExchangesCode0
Described Object Detection: Liberating Object Detection with Flexible ExpressionsCode1
Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks0
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object SegmentationCode1
Kosmos-2: Grounding Multimodal Large Language Models to the WorldCode1
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input0
Language Adaptive Weight Generation for Multi-task Visual GroundingCode0
Referring Expression Comprehension Using Language Adaptive InferenceCode0
GRES: Generalized Referring Expression SegmentationCode2
DisCLIP: Open-Vocabulary Referring Expression Generation0
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving0
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from ExamplesCode0
Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive TeachersCode0
Advancing Referring Expression Segmentation Beyond Single ImageCode1
Meta Compositional Referring Expression Segmentation0
Zero-shot Referring Image Segmentation with Global-Local Context FeaturesCode1
NS3D: Neuro-Symbolic Grounding of 3D Objects and RelationsCode1
Universal Instance Perception as Object Discovery and RetrievalCode3
Show:102550
← PrevPage 3 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified