| OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework | Feb 7, 2022 | Image Captioningimage-classification | CodeCode Available | 0 | 5 |
| Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation | Apr 22, 2025 | Referring ExpressionReferring expression generation | CodeCode Available | 0 | 5 |
| Cross-Modal Self-Attention Network for Referring Image Segmentation | Apr 9, 2019 | Image SegmentationReferring Expression | CodeCode Available | 0 | 5 |
| Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos | Sep 21, 2022 | Action DetectionAction Recognition | CodeCode Available | 0 | 5 |
| Visual Referring Expression Recognition: What Do Systems Actually Learn? | May 30, 2018 | Referring Expression | CodeCode Available | 0 | 5 |
| Adversarial Robustness for Visual Grounding of Multimodal Large Language Models | May 16, 2024 | Adversarial AttackAdversarial Robustness | CodeCode Available | 0 | 5 |
| Single-Stream Multi-Level Alignment for Vision-Language Pretraining | Mar 27, 2022 | Image-text RetrievalQuestion Answering | CodeCode Available | 0 | 5 |
| Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive Teachers | May 22, 2023 | Referring Expression | CodeCode Available | 0 | 5 |
| CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions | Jan 3, 2019 | DiagnosticImage Segmentation | CodeCode Available | 0 | 5 |
| Learning To Segment Every Referring Object Point by Point | Jan 1, 2023 | ObjectReferring Expression | CodeCode Available | 0 | 5 |
| A Joint Speaker-Listener-Reinforcer Model for Referring Expressions | Dec 30, 2016 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 | 5 |
| Continual Referring Expression Comprehension via Dual Modular Memorization | Nov 25, 2023 | MemorizationReferring Expression | CodeCode Available | 0 | 5 |
| Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding | Aug 28, 2019 | AttributeReferring Expression | CodeCode Available | 0 | 5 |
| MAttNet: Modular Attention Network for Referring Expression Comprehension | Jan 24, 2018 | Generalized Referring Expression SegmentationReferring Expression | CodeCode Available | 0 | 5 |
| Reasoning About Pragmatics with Neural Listeners and Speakers | Apr 2, 2016 | Referring ExpressionText Generation | CodeCode Available | 0 | 5 |
| Deconfounded Visual Grounding | Dec 31, 2021 | Referring ExpressionVisual Grounding | CodeCode Available | 0 | 5 |
| A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection Training | Aug 20, 2024 | Autonomous VehiclesComputational Efficiency | CodeCode Available | 0 | 5 |
| Grounding Referring Expressions in Images by Variational Context | Dec 5, 2017 | Multiple Instance LearningReferring Expression | CodeCode Available | 0 | 5 |
| Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding | Jan 1, 2025 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| Text Augmented Spatial-aware Zero-shot Referring Image Segmentation | Oct 27, 2023 | Image SegmentationReferring Expression | —Unverified | 0 | 0 |
| Text-driven Affordance Learning from Egocentric Vision | Apr 3, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| The Methodius Corpus of Rhetorical Discourse Structures and Generated Texts | May 1, 2016 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| The Pipeline Model for Resolution of Anaphoric Reference and Resolution of Entity Reference | Nov 1, 2021 | coreference-resolutionCoreference Resolution | —Unverified | 0 | 0 |
| The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge | Jul 6, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| The WebNLG Challenge: Generating Text from RDF Data | Sep 1, 2017 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| Toward Forgetting-Sensitive Referring Expression Generationfor Integrated Robot Architectures | Jul 16, 2020 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| Towards Situated Dialogue: Revisiting Referring Expression Generation | Oct 1, 2013 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| Transcrib3D: 3D Referring Expression Resolution through Large Language Models | Apr 30, 2024 | Referring Expression | —Unverified | 0 | 0 |
| Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks | Jun 17, 2022 | Depth EstimationImage Generation | —Unverified | 0 | 0 |
| UNITER: Learning UNiversal Image-TExt Representations | Sep 25, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 | 0 |
| Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching | Jan 18, 2022 | Image-text matchingReferring Expression | —Unverified | 0 | 0 |
| Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos | Mar 7, 2017 | Referring Expression | —Unverified | 0 | 0 |
| Using Lexical Alignment and Referring Ability to Address Data Sparsity in Situated Dialog Reference Resolution | Oct 1, 2018 | Referring Expression | —Unverified | 0 | 0 |
| Using Referring Expression Generation to Model Literary Style | Dec 1, 2021 | modelReferring Expression | —Unverified | 0 | 0 |
| Utilizing Every Image Object for Semi-supervised Phrase Grounding | Nov 5, 2020 | Phrase GroundingReferring Expression | —Unverified | 0 | 0 |
| Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions | Jul 8, 2019 | Multiple Instance LearningReferring Expression | —Unverified | 0 | 0 |
| Video Referring Expression Comprehension via Transformer with Content-aware Query | Oct 6, 2022 | cross-modal alignmentReferring Expression | —Unverified | 0 | 0 |
| Video Referring Expression Comprehension via Transformer with Content-conditioned Query | Oct 25, 2023 | cross-modal alignmentReferring Expression | —Unverified | 0 | 0 |
| Viewpoint-Aware Visual Grounding in 3D Scenes | Jan 1, 2024 | 3D visual groundingReferring Expression | —Unverified | 0 | 0 |
| Visual Question Answering based on Local-Scene-Aware Referring Expression Generation | Jan 22, 2021 | Question AnsweringReferring Expression | —Unverified | 0 | 0 |
| VLN BERT: A Recurrent Vision-and-Language BERT for Navigation | Jun 19, 2021 | Decision MakingDecoder | —Unverified | 0 | 0 |
| VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching | May 12, 2021 | Image-text matchingReferring Expression | —Unverified | 0 | 0 |
| VQD: Visual Query Detection in Natural Scenes | Apr 4, 2019 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar | Mar 19, 2024 | Autonomous NavigationReferring Expression | —Unverified | 0 | 0 |
| Weakly-supervised segmentation of referring expressions | May 10, 2022 | Image SegmentationReferring Expression | —Unverified | 0 | 0 |
| What can Neural Referential Form Selectors Learn? | Aug 15, 2021 | FormPosition | —Unverified | 0 | 0 |
| Trainable Referring Expression Generation using Overspecification Preferences | Apr 12, 2017 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| 3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation | Apr 17, 2025 | Referring ExpressionReferring Expression Segmentation | —Unverified | 0 | 0 |
| A case study on context-bound referring expression generation | Oct 1, 2019 | Referring ExpressionReferring expression generation | —Unverified | 0 | 0 |
| A Commercial Perspective on Reference | Sep 1, 2017 | Referring ExpressionText Generation | —Unverified | 0 | 0 |