| Enriching the E2E dataset | Aug 1, 2021 | Referring ExpressionReferring expression generation | CodeCode Available | 0 | 5 |
| Understanding Synonymous Referring Expressions via Contrastive Features | Apr 20, 2021 | ObjectReferring Expression | CodeCode Available | 0 | 5 |
| Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation | May 24, 2024 | DecoderGeneralized Referring Expression Segmentation | CodeCode Available | 0 | 5 |
| Scene-Text Oriented Reffering Expression Comprehension | Nov 4, 2022 | Object LocalizationReferring Expression | CodeCode Available | 0 | 5 |
| Searching for Ambiguous Objects in Videos using Relational Referring Expressions | Aug 3, 2019 | Deep AttentionNatural Language Visual Grounding | CodeCode Available | 0 | 5 |
| OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework | Feb 7, 2022 | Image Captioningimage-classification | CodeCode Available | 0 | 5 |
| Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation | May 24, 2021 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 | 5 |
| Enriching the WebNLG corpus | Nov 1, 2018 | Machine TranslationReferring Expression | CodeCode Available | 0 | 5 |
| Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding | Jul 18, 2022 | AttributeReferring Expression | CodeCode Available | 0 | 5 |
| Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models | Nov 21, 2023 | Image SegmentationLanguage Modelling | CodeCode Available | 0 | 5 |
| Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation | Apr 22, 2025 | Referring ExpressionReferring expression generation | CodeCode Available | 0 | 5 |
| Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models | Nov 24, 2023 | AllReferring Expression | CodeCode Available | 0 | 5 |
| Single-Stream Multi-Level Alignment for Vision-Language Pretraining | Mar 27, 2022 | Image-text RetrievalQuestion Answering | CodeCode Available | 0 | 5 |
| Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters | Mar 28, 2020 | ColorizationImage Colorization | CodeCode Available | 0 | 5 |
| Generation and Comprehension of Unambiguous Object Descriptions | Nov 7, 2015 | Image CaptioningObject | CodeCode Available | 0 | 5 |
| Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples | May 24, 2023 | DiagnosticReferring Expression | CodeCode Available | 0 | 5 |
| Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge | Jun 2, 2020 | 16kReferring Expression | CodeCode Available | 0 | 5 |
| Visual Referring Expression Recognition: What Do Systems Actually Learn? | May 30, 2018 | Referring Expression | CodeCode Available | 0 | 5 |
| Learning To Segment Every Referring Object Point by Point | Jan 1, 2023 | ObjectReferring Expression | CodeCode Available | 0 | 5 |
| Whether you can locate or not? Interactive Referring Expression Generation | Aug 19, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 | 5 |
| Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos | Sep 21, 2022 | Action DetectionAction Recognition | CodeCode Available | 0 | 5 |
| The WebNLG Challenge: Generating Text from RDF Data | Sep 1, 2017 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| Toward Forgetting-Sensitive Referring Expression Generationfor Integrated Robot Architectures | Jul 16, 2020 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| Towards Situated Dialogue: Revisiting Referring Expression Generation | Oct 1, 2013 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| Transcrib3D: 3D Referring Expression Resolution through Large Language Models | Apr 30, 2024 | Referring Expression | —Unverified | 0 | 0 |
| Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks | Jun 17, 2022 | Depth EstimationImage Generation | —Unverified | 0 | 0 |
| UNITER: Learning UNiversal Image-TExt Representations | Sep 25, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 | 0 |
| Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching | Jan 18, 2022 | Image-text matchingReferring Expression | —Unverified | 0 | 0 |
| Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos | Mar 7, 2017 | Referring Expression | —Unverified | 0 | 0 |
| Using Lexical Alignment and Referring Ability to Address Data Sparsity in Situated Dialog Reference Resolution | Oct 1, 2018 | Referring Expression | —Unverified | 0 | 0 |
| Using Referring Expression Generation to Model Literary Style | Dec 1, 2021 | modelReferring Expression | —Unverified | 0 | 0 |
| Utilizing Every Image Object for Semi-supervised Phrase Grounding | Nov 5, 2020 | Phrase GroundingReferring Expression | —Unverified | 0 | 0 |
| Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions | Jul 8, 2019 | Multiple Instance LearningReferring Expression | —Unverified | 0 | 0 |
| Video Referring Expression Comprehension via Transformer with Content-aware Query | Oct 6, 2022 | cross-modal alignmentReferring Expression | —Unverified | 0 | 0 |
| Video Referring Expression Comprehension via Transformer with Content-conditioned Query | Oct 25, 2023 | cross-modal alignmentReferring Expression | —Unverified | 0 | 0 |
| Viewpoint-Aware Visual Grounding in 3D Scenes | Jan 1, 2024 | 3D visual groundingReferring Expression | —Unverified | 0 | 0 |
| Visual Question Answering based on Local-Scene-Aware Referring Expression Generation | Jan 22, 2021 | Question AnsweringReferring Expression | —Unverified | 0 | 0 |
| VLN BERT: A Recurrent Vision-and-Language BERT for Navigation | Jun 19, 2021 | Decision MakingDecoder | —Unverified | 0 | 0 |
| VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching | May 12, 2021 | Image-text matchingReferring Expression | —Unverified | 0 | 0 |
| VQD: Visual Query Detection in Natural Scenes | Apr 4, 2019 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar | Mar 19, 2024 | Autonomous NavigationReferring Expression | —Unverified | 0 | 0 |
| Weakly-supervised segmentation of referring expressions | May 10, 2022 | Image SegmentationReferring Expression | —Unverified | 0 | 0 |
| What can Neural Referential Form Selectors Learn? | Aug 15, 2021 | FormPosition | —Unverified | 0 | 0 |
| Trainable Referring Expression Generation using Overspecification Preferences | Apr 12, 2017 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| 3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation | Apr 17, 2025 | Referring ExpressionReferring Expression Segmentation | —Unverified | 0 | 0 |
| A case study on context-bound referring expression generation | Oct 1, 2019 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| A Commercial Perspective on Reference | Sep 1, 2017 | Referring ExpressionText Generation | —Unverified | 0 | 0 |
| Adapting Descriptions of People to the Point of View of a Moving Observer | Nov 1, 2018 | PositionReferring Expression | —Unverified | 0 | 0 |
| A Linguistic Perspective on Reference: Choosing a Feature Set for Generating Referring Expressions in Context | Dec 1, 2020 | Feature ImportanceForm | —Unverified | 0 | 0 |
| An Empirical Approach for Modeling Fuzzy Geographical Descriptors | Mar 30, 2017 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |