| Learning Visual Grounding from Generative Vision and Language Model | Jul 18, 2024 | AttributeLanguage Modeling | —Unverified | 0 | 0 |
| Lite-MDETR: A Lightweight Multi-Modal Detector | Jan 1, 2022 | object-detectionObject Detection | —Unverified | 0 | 0 |
| The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge | Jul 6, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects | Dec 8, 2023 | Image Captioningobject-detection | —Unverified | 0 | 0 |
| M^2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension | Jul 1, 2024 | GPUReferring Expression | —Unverified | 0 | 0 |
| Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression | Sep 5, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments | Nov 15, 2020 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| MaskInversion: Localized Embeddings via Optimization of Explainability Maps | Jul 29, 2024 | Image GenerationReferring Expression | —Unverified | 0 | 0 |
| A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension | Sep 16, 2019 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction | Dec 21, 2023 | 16kAttribute | —Unverified | 0 | 0 |