| MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding | Apr 26, 2021 | Generalized Referring Expression ComprehensionPhrase Grounding | CodeCode Available | 1 | 5 |
| Modeling Context in Referring Expressions | Jul 31, 2016 | Referring ExpressionReferring expression generation | CodeCode Available | 1 | 5 |
| Correspondence Matters for Video Referring Expression Comprehension | Jul 21, 2022 | Contrastive LearningReferring Expression | CodeCode Available | 1 | 5 |
| Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations | Jun 30, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation | Dec 3, 2024 | Referring ExpressionReferring Expression Segmentation | CodeCode Available | 1 | 5 |
| Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding | Sep 3, 2020 | Referring ExpressionVocal Bursts Valence Prediction | CodeCode Available | 1 | 5 |
| Relationship-Embedded Representation Learning for Grounding Referring Expressions | Jun 11, 2019 | Referring ExpressionRepresentation Learning | CodeCode Available | 1 | 5 |
| Airbert: In-domain Pretraining for Vision-and-Language Navigation | Aug 20, 2021 | NavigateReferring Expression | CodeCode Available | 1 | 5 |
| Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints | Jan 12, 2025 | Image SegmentationReferring Expression | CodeCode Available | 1 | 5 |
| New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration | Feb 27, 2025 | Image ComprehensionReferring Expression | CodeCode Available | 1 | 5 |
| RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation | Jul 3, 2023 | Image SegmentationReferring Expression | CodeCode Available | 1 | 5 |
| Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression | Jun 19, 2021 | Instruction FollowingNavigate | CodeCode Available | 1 | 5 |
| Refer360^: A Referring Expression Recognition Dataset in 360^ Images | Jul 1, 2020 | Referring Expression | CodeCode Available | 1 | 5 |
| GSVA: Generalized Segmentation via Multimodal Large Language Models | Dec 15, 2023 | DecoderGeneralized Referring Expression Segmentation | CodeCode Available | 1 | 5 |
| Referring Atomic Video Action Recognition | Jul 2, 2024 | Action LocalizationAction Recognition | CodeCode Available | 1 | 5 |
| RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone Scenes | Feb 1, 2025 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 1 | 5 |
| 3D-GRES: Generalized 3D Referring Expression Segmentation | Jul 30, 2024 | ObjectReferring Expression | CodeCode Available | 1 | 5 |
| RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D | Aug 23, 2023 | ObjectObject Tracking | CodeCode Available | 1 | 5 |
| Referring Expression Counting | Jan 1, 2024 | 8kobject-detection | CodeCode Available | 1 | 5 |
| Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding | Jun 8, 2021 | Referring ExpressionSentence | CodeCode Available | 1 | 5 |
| Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs | Oct 1, 2023 | Referring Expression | CodeCode Available | 1 | 5 |
| CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation | May 24, 2024 | Generalized Referring Expression SegmentationObject | CodeCode Available | 1 | 5 |
| ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension | Apr 12, 2022 | image-classificationImage Classification | CodeCode Available | 1 | 5 |
| OCID-Ref: A 3D Robotic Dataset with Embodied Language for Clutter Scene Grounding | Mar 13, 2021 | Referring ExpressionReferring Expression Segmentation | CodeCode Available | 1 | 5 |
| GRIT: General Robust Image Task Benchmark | Apr 28, 2022 | Instance SegmentationKeypoint Detection | CodeCode Available | 1 | 5 |
| Graph-Structured Referring Expression Reasoning in The Wild | Apr 19, 2020 | Referring Expression | CodeCode Available | 1 | 5 |
| PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models | May 23, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models? | Feb 6, 2025 | Question AnsweringReferring Expression | CodeCode Available | 1 | 5 |
| DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM | Mar 19, 2024 | Objectobject-detection | CodeCode Available | 1 | 5 |
| Human-centric Spatio-Temporal Video Grounding With Visual Transformers | Nov 10, 2020 | Referring ExpressionSentence | CodeCode Available | 1 | 5 |
| Described Object Detection: Liberating Object Detection with Flexible Expressions | Jul 24, 2023 | Binary ClassificationDescribed Object Detection | CodeCode Available | 1 | 5 |
| 3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation | Aug 31, 2023 | NavigateReferring Expression | CodeCode Available | 1 | 5 |
| GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs | Nov 8, 2023 | Question AnsweringReferring Expression | CodeCode Available | 1 | 5 |
| Exploring Contextual Attribute Density in Referring Expression Counting | Jan 1, 2025 | AttributeReferring Expression | CodeCode Available | 1 | 5 |
| Exploring Contextual Attribute Density in Referring Expression Counting | Mar 16, 2025 | AttributeReferring Expression | CodeCode Available | 1 | 5 |
| Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation | Sep 20, 2024 | Image SegmentationReferring Expression | CodeCode Available | 1 | 5 |
| Advancing Referring Expression Segmentation Beyond Single Image | May 21, 2023 | Co-Salient Object DetectionObject | CodeCode Available | 1 | 5 |
| IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation | Jan 9, 2025 | DecoderReferring Expression | CodeCode Available | 1 | 5 |
| Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning | Mar 9, 2021 | Deep Reinforcement LearningReferring Expression | CodeCode Available | 1 | 5 |
| IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis | Mar 2, 2025 | Image SegmentationImage-text matching | CodeCode Available | 1 | 5 |
| LAVT: Language-Aware Vision Transformer for Referring Image Segmentation | Dec 4, 2021 | DecoderGeneralized Referring Expression Segmentation | CodeCode Available | 1 | 5 |
| FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension | Sep 23, 2024 | Image ComprehensionReferring Expression | CodeCode Available | 1 | 5 |
| Learning to Evaluate Performance of Multi-modal Semantic Localization | Sep 14, 2022 | Cross-Modal RetrievalReferring Expression | CodeCode Available | 1 | 5 |
| Image Segmentation Using Text and Image Prompts | Dec 18, 2021 | DecoderImage Segmentation | CodeCode Available | 1 | 5 |
| Referring Transformer: A One-step Approach to Multi-task Visual Grounding | Jun 6, 2021 | DecoderReferring Expression | CodeCode Available | 1 | 5 |
| Multi-branch Collaborative Learning Network for 3D Visual Grounding | Jul 7, 2024 | 3D visual groundingReferring Expression | CodeCode Available | 1 | 5 |
| LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension | Sep 18, 2024 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 1 | 5 |
| LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition | Feb 15, 2024 | Grounded Multimodal Named Entity RecognitionMulti-modal Named Entity Recognition | CodeCode Available | 1 | 5 |
| A Fast and Accurate One-Stage Approach to Visual Grounding | Aug 18, 2019 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 1 | 5 |
| VL-BERT: Pre-training of Generic Visual-Linguistic Representations | Aug 22, 2019 | Image-text matchingLanguage Modelling | CodeCode Available | 1 | 5 |