| Language-Conditioned Graph Networks for Relational Reasoning | May 10, 2019 | ObjectReferring Expression Comprehension | CodeCode Available | 0 | 5 |
| Language-Conditioned Feature Pyramids for Visual Selection Tasks | Nov 1, 2020 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 | 5 |
| Language Adaptive Weight Generation for Multi-task Visual Grounding | Jun 6, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 | 5 |
| Collecting Visually-Grounded Dialogue with A Game Of Sorts | Sep 10, 2023 | Coreference ResolutionImage Retrieval | CodeCode Available | 0 | 5 |
| HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks | Aug 24, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| A Joint Speaker-Listener-Reinforcer Model for Referring Expressions | Dec 30, 2016 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 | 5 |
| Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models | Nov 24, 2023 | AllReferring Expression | CodeCode Available | 0 | 5 |
| Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models | Oct 21, 2024 | Instruction Followingobject-detection | CodeCode Available | 0 | 5 |
| CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions | Jan 3, 2019 | DiagnosticImage Segmentation | CodeCode Available | 0 | 5 |
| Cosine meets Softmax: A tough-to-beat baseline for visual grounding | Sep 13, 2020 | Autonomous DrivingMetric Learning | CodeCode Available | 0 | 5 |