| Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos | Sep 21, 2022 | Action DetectionAction Recognition | CodeCode Available | 0 | 5 |
| Natural Language Object Retrieval | Nov 13, 2015 | Image CaptioningImage Retrieval | CodeCode Available | 0 | 5 |
| Towards Language-guided Visual Recognition via Dynamic Convolutions | Oct 17, 2021 | Question AnsweringReferring Expression | CodeCode Available | 0 | 5 |
| Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation | May 24, 2021 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 | 5 |
| Continual Referring Expression Comprehension via Dual Modular Memorization | Nov 25, 2023 | MemorizationReferring Expression | CodeCode Available | 0 | 5 |
| AttnGrounder: Talking to Cars with Attention | Sep 11, 2020 | Referring Expression ComprehensionVisual Grounding | CodeCode Available | 0 | 5 |
| Revisiting Counterfactual Problems in Referring Expression Comprehension | Jan 1, 2024 | AttributeContrastive Learning | CodeCode Available | 0 | 5 |
| ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension | Nov 16, 2021 | image-classificationImage Classification | —Unverified | 0 | 0 |
| UNITER: Learning UNiversal Image-TExt Representations | Sep 25, 2019 | Image-text matchingImage-text Retrieval | —Unverified | 0 | 0 |
| RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension | Jan 1, 2023 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |