| Webly Supervised Concept Expansion for General Purpose Vision Models | Feb 4, 2022 | Human-Object Interaction DetectionImage Retrieval | —Unverified | 0 | 0 |
| MUTATT: Visual-Textual Mutual Guidance for Referring Expression Comprehension | Mar 18, 2020 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks | Dec 12, 2018 | Graph AttentionObject | —Unverified | 0 | 0 |
| Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks | Jun 17, 2022 | Depth EstimationImage Generation | —Unverified | 0 | 0 |
| Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks | Jan 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning | Jul 31, 2022 | AllReferring Expression | —Unverified | 0 | 0 |
| Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos | Mar 23, 2021 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries | Nov 17, 2017 | ObjectObject Discovery | —Unverified | 0 | 0 |
| Playing Lottery Tickets with Vision and Language | Apr 23, 2021 | Image-text RetrievalQuestion Answering | —Unverified | 0 | 0 |
| VQD: Visual Query Detection in Natural Scenes | Apr 4, 2019 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| PPGN: Phrase-Guided Proposal Generation Network For Referring Expression Comprehension | Dec 20, 2020 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention | May 5, 2021 | Question AnsweringReferring Expression | —Unverified | 0 | 0 |
| PropTest: Automatic Property Testing for Improved Visual Programming | Mar 25, 2024 | Question AnsweringReferring Expression | —Unverified | 0 | 0 |
| Real-Time Referring Expression Comprehension by Single-Stage Grounding Network | Dec 9, 2018 | AttributeReferring Expression | —Unverified | 0 | 0 |
| ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension | Nov 16, 2021 | image-classificationImage Classification | —Unverified | 0 | 0 |
| UNITER: Learning UNiversal Image-TExt Representations | Sep 25, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 | 0 |
| RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension | Jan 1, 2023 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |