| Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation | Jan 1, 2024 | DescriptiveObject | CodeCode Available | 2 |
| Viewpoint-Aware Visual Grounding in 3D Scenes | Jan 1, 2024 | 3D visual groundingReferring Expression | —Unverified | 0 |
| Referring Expression Counting | Jan 1, 2024 | 8kobject-detection | CodeCode Available | 1 |
| Tune-An-Ellipse: CLIP Has Potential to Find What You Want | Jan 1, 2024 | ObjectReferring Expression | CodeCode Available | 1 |
| Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction | Dec 21, 2023 | 16kAttribute | —Unverified | 0 |
| GSVA: Generalized Segmentation via Multimodal Large Language Models | Dec 15, 2023 | DecoderGeneralized Referring Expression Segmentation | CodeCode Available | 1 |
| Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation | Dec 13, 2023 | DescriptiveObject | CodeCode Available | 1 |
| Localized Symbolic Knowledge Distillation for Visual Commonsense Models | Dec 8, 2023 | Image DescriptionInstruction Following | CodeCode Available | 0 |
| Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection | Dec 4, 2023 | Image to textobject-detection | —Unverified | 0 |
| InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation | Nov 30, 2023 | Image CaptioningReferring Expression | CodeCode Available | 0 |
| Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions | Nov 28, 2023 | DisentanglementReferring Expression | CodeCode Available | 1 |
| Continual Referring Expression Comprehension via Dual Modular Memorization | Nov 25, 2023 | MemorizationReferring Expression | CodeCode Available | 0 |
| Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models | Nov 24, 2023 | AllReferring Expression | CodeCode Available | 0 |
| Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models | Nov 21, 2023 | Image SegmentationLanguage Modelling | CodeCode Available | 0 |
| GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs | Nov 8, 2023 | Question AnsweringReferring Expression | CodeCode Available | 1 |
| NExT-Chat: An LMM for Chat, Detection and Segmentation | Nov 8, 2023 | Referring ExpressionReferring Expression Segmentation | CodeCode Available | 2 |
| CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding | Nov 6, 2023 | CoLAQuestion Answering | —Unverified | 0 |
| GLaMM: Pixel Grounding Large Multimodal Model | Nov 6, 2023 | Conversational Question AnsweringImage Captioning | CodeCode Available | 2 |
| Towards Omni-supervised Referring Expression Segmentation | Nov 1, 2023 | Referring ExpressionReferring Expression Segmentation | CodeCode Available | 0 |
| Text Augmented Spatial-aware Zero-shot Referring Image Segmentation | Oct 27, 2023 | Image SegmentationReferring Expression | —Unverified | 0 |
| Video Referring Expression Comprehension via Transformer with Content-conditioned Query | Oct 25, 2023 | cross-modal alignmentReferring Expression | —Unverified | 0 |
| Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V | Oct 17, 2023 | Interactive SegmentationReferring Expression | CodeCode Available | 4 |
| Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs | Oct 1, 2023 | Referring Expression | CodeCode Available | 1 |
| Multi-modal Domain Adaptation for REG via Relation Transfer | Sep 23, 2023 | Domain Adaptationimage-classification | —Unverified | 0 |
| CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation | Sep 17, 2023 | DecoderReferring Expression | —Unverified | 0 |
| Collecting Visually-Grounded Dialogue with A Game Of Sorts | Sep 10, 2023 | Coreference ResolutionImage Retrieval | CodeCode Available | 0 |
| 3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation | Aug 31, 2023 | NavigateReferring Expression | CodeCode Available | 1 |
| GREC: Generalized Referring Expression Comprehension | Aug 30, 2023 | Generalized Referring Expression ComprehensionReferring Expression | CodeCode Available | 2 |
| A Unified Framework for 3D Point Cloud Visual Grounding | Aug 23, 2023 | CPUGPU | CodeCode Available | 1 |
| RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D | Aug 23, 2023 | ObjectObject Tracking | CodeCode Available | 1 |
| March in Chat: Interactive Prompting for Remote Embodied Referring Expression | Aug 20, 2023 | Referring ExpressionVision and Language Navigation | CodeCode Available | 1 |
| Whether you can locate or not? Interactive Referring Expression Generation | Aug 19, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| 'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges | Jul 28, 2023 | Referring Expression | CodeCode Available | 0 |
| Described Object Detection: Liberating Object Detection with Flexible Expressions | Jul 24, 2023 | Binary ClassificationDescribed Object Detection | CodeCode Available | 1 |
| Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks | Jul 14, 2023 | ObjectReferring Expression | —Unverified | 0 |
| RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation | Jul 3, 2023 | Image SegmentationReferring Expression | CodeCode Available | 1 |
| Kosmos-2: Grounding Multimodal Large Language Models to the World | Jun 26, 2023 | Image CaptioningIn-Context Learning | CodeCode Available | 1 |
| Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input | Jun 25, 2023 | DiversityImage-text Retrieval | —Unverified | 0 |
| Language Adaptive Weight Generation for Multi-task Visual Grounding | Jun 6, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| Referring Expression Comprehension Using Language Adaptive Inference | Jun 6, 2023 | object-detectionObject Detection | CodeCode Available | 0 |
| GRES: Generalized Referring Expression Segmentation | Jun 1, 2023 | Generalized Referring Expression SegmentationReferring Expression | CodeCode Available | 2 |
| DisCLIP: Open-Vocabulary Referring Expression Generation | May 30, 2023 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving | May 25, 2023 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples | May 24, 2023 | DiagnosticReferring Expression | CodeCode Available | 0 |
| Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive Teachers | May 22, 2023 | Referring Expression | CodeCode Available | 0 |
| Advancing Referring Expression Segmentation Beyond Single Image | May 21, 2023 | Co-Salient Object DetectionObject | CodeCode Available | 1 |
| Meta Compositional Referring Expression Segmentation | Apr 10, 2023 | Meta-LearningReferring Expression | —Unverified | 0 |
| Zero-shot Referring Image Segmentation with Global-Local Context Features | Mar 31, 2023 | Image SegmentationReferring Expression | CodeCode Available | 1 |
| NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations | Mar 23, 2023 | Question AnsweringReferring Expression | CodeCode Available | 1 |
| Universal Instance Perception as Object Discovery and Retrieval | Mar 12, 2023 | Described Object DetectionGeneralized Referring Expression Comprehension | CodeCode Available | 3 |