| Dynamic Graph Attention for Referring Expression Comprehension | Sep 18, 2019 | Graph AttentionReferring Expression | —Unverified | 0 |
| Dynamic Inference With Grounding Based Vision and Language Models | Jan 1, 2023 | Language ModellingReferring Expression | —Unverified | 0 |
| DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension | Jan 1, 2025 | DescriptiveReferring Expression | —Unverified | 0 |
| Differentiated Relevances Embedding for Group-based Referring Expression Comprehension | Mar 12, 2022 | AttributeObject | —Unverified | 0 |
| ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph | Jun 30, 2020 | AttributePrediction | —Unverified | 0 |
| Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input | Jun 25, 2023 | DiversityImage-text Retrieval | —Unverified | 0 |
| Exploring Spatial Language Grounding Through Referring Expressions | Feb 4, 2025 | Image CaptioningNegation | —Unverified | 0 |
| FindIt: Generalized Localization with Natural Language Queries | Mar 31, 2022 | Natural Language QueriesObject | —Unverified | 0 |
| Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks | Jul 14, 2023 | ObjectReferring Expression | —Unverified | 0 |
| FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis | Jan 17, 2025 | Bayesian InferenceLanguage Modeling | —Unverified | 0 |
| Deep Fragment Embeddings for Bidirectional Image Sentence Mapping | Jun 22, 2014 | Referring Expression ComprehensionRetrieval | —Unverified | 0 |
| CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding | Nov 6, 2023 | CoLAQuestion Answering | —Unverified | 0 |
| Synthetic Visual Genome | Jun 9, 2025 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing | Mar 16, 2025 | Change DetectionImage Captioning | —Unverified | 0 |
| Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding | Mar 19, 2020 | ObjectReferring Expression Comprehension | —Unverified | 0 |
| Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension | Mar 1, 2020 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension | Nov 22, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension | Jan 2, 2025 | Generalized Referring Expression ComprehensionGeneralized Referring Expression Segmentation | —Unverified | 0 |
| Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training | Mar 4, 2024 | MathPhrase Grounding | —Unverified | 0 |
| Video Referring Expression Comprehension via Transformer with Content-conditioned Query | Oct 25, 2023 | cross-modal alignmentReferring Expression | —Unverified | 0 |
| Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding | Jan 1, 2025 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving | May 25, 2023 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Language-Mediated, Object-Centric Representation Learning | Dec 31, 2020 | ObjectObject Discovery | —Unverified | 0 |
| Text-driven Affordance Learning from Egocentric Vision | Apr 3, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection | Dec 4, 2023 | Image to textobject-detection | —Unverified | 0 |