| VL-BERT: Pre-training of Generic Visual-Linguistic Representations | Aug 22, 2019 | Image-text matchingLanguage Modelling | CodeCode Available | 1 |
| A Fast and Accurate One-Stage Approach to Visual Grounding | Aug 18, 2019 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 1 |
| Relationship-Embedded Representation Learning for Grounding Referring Expressions | Jun 11, 2019 | Referring ExpressionRepresentation Learning | CodeCode Available | 1 |
| Generating Easy-to-Understand Referring Expressions for Target Identifications | Nov 29, 2018 | Referring Expression | CodeCode Available | 1 |
| Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding | Mar 29, 2017 | Referring Expression | CodeCode Available | 1 |
| Modeling Context in Referring Expressions | Jul 31, 2016 | Referring ExpressionReferring expression generation | CodeCode Available | 1 |
| Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval | Jun 28, 2025 | Cross-Modal RetrievalImage Captioning | —Unverified | 0 |
| Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models | Jun 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Referring Expression Instance Retrieval and A Strong End-to-End Baseline | Jun 23, 2025 | Image RetrievalReferring Expression | —Unverified | 0 |
| Gondola: Grounded Vision Language Planning for Generalizable Robotic Manipulation | Jun 12, 2025 | Referring Expression | —Unverified | 0 |
| Synthetic Visual Genome | Jun 9, 2025 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Refer to Anything with Vision-Language Prompts | Jun 5, 2025 | BenchmarkingGeneralized Referring Expression Segmentation | —Unverified | 0 |
| From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes | Jun 5, 2025 | 3D visual groundingObject | —Unverified | 0 |
| Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning | Jun 4, 2025 | ObjectReferring Expression | —Unverified | 0 |
| RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions | Jun 3, 2025 | Referring ExpressionSynthetic Data Generation | —Unverified | 0 |
| Improving Contrastive Learning for Referring Expression Counting | May 28, 2025 | Contrastive LearningObject Counting | CodeCode Available | 0 |
| Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model | May 25, 2025 | cross-modal alignmentImage Segmentation | —Unverified | 0 |
| WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation | May 24, 2025 | Contrastive LearningReferring Expression | CodeCode Available | 0 |
| Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models | May 12, 2025 | NavigateReferring Expression | —Unverified | 0 |
| RESAnything: Attribute Prompting for Arbitrary Referring Segmentation | May 3, 2025 | AttributeImage Segmentation | —Unverified | 0 |
| Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation | Apr 22, 2025 | Referring ExpressionReferring expression generation | CodeCode Available | 0 |
| LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation | Apr 20, 2025 | AttributeImage Segmentation | —Unverified | 0 |
| 3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation | Apr 17, 2025 | Referring ExpressionReferring Expression Segmentation | —Unverified | 0 |
| Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Apr 2, 2025 | DescriptiveLarge Language Model | CodeCode Available | 0 |
| MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing | Mar 31, 2025 | Objectobject-detection | CodeCode Available | 0 |
| Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding | Mar 25, 2025 | AttributeObject | —Unverified | 0 |
| GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing | Mar 16, 2025 | Change DetectionImage Captioning | —Unverified | 0 |
| Cognitive Disentanglement for Referring Multi-Object Tracking | Mar 14, 2025 | DisentanglementMulti-Object Tracking | —Unverified | 0 |
| Exploring Spatial Language Grounding Through Referring Expressions | Feb 4, 2025 | Image CaptioningNegation | —Unverified | 0 |
| Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities | Jan 22, 2025 | BenchmarkingReferring Expression | —Unverified | 0 |
| FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis | Jan 17, 2025 | Bayesian InferenceLanguage Modeling | —Unverified | 0 |
| Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks | Jan 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension | Jan 2, 2025 | Generalized Referring Expression ComprehensionGeneralized Referring Expression Segmentation | —Unverified | 0 |
| Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding | Jan 1, 2025 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension | Jan 1, 2025 | DescriptiveReferring Expression | —Unverified | 0 |
| Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension | Nov 22, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Instance-Aware Generalized Referring Expression Segmentation | Nov 22, 2024 | Generalized Referring Expression SegmentationObject | —Unverified | 0 |
| Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation | Nov 3, 2024 | Data AugmentationImage Segmentation | —Unverified | 0 |
| SegLLM: Multi-round Reasoning Segmentation | Oct 24, 2024 | Reasoning SegmentationReferring Expression | —Unverified | 0 |
| Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models | Oct 21, 2024 | Instruction Followingobject-detection | CodeCode Available | 0 |
| Grounding Language in Multi-Perspective Referential Communication | Oct 4, 2024 | Referring ExpressionReferring expression generation | CodeCode Available | 0 |
| Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding | Sep 9, 2024 | Image RetrievalReferring Expression | CodeCode Available | 0 |
| Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression | Sep 5, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection Training | Aug 20, 2024 | Autonomous VehiclesComputational Efficiency | CodeCode Available | 0 |
| Revisiting Multi-Modal LLM Evaluation | Aug 9, 2024 | Chart UnderstandingOptical Character Recognition | —Unverified | 0 |
| MaskInversion: Localized Embeddings via Optimization of Explainability Maps | Jul 29, 2024 | Image GenerationReferring Expression | —Unverified | 0 |
| Look Hear: Gaze Prediction for Speech-directed Human Attention | Jul 28, 2024 | DecoderGaze Prediction | —Unverified | 0 |
| Learning Visual Grounding from Generative Vision and Language Model | Jul 18, 2024 | AttributeLanguage Modeling | —Unverified | 0 |
| The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge | Jul 6, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation | Jul 2, 2024 | Referring ExpressionReferring Expression Segmentation | —Unverified | 0 |