| DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding | Nov 28, 2022 | object-detectionObject Detection | CodeCode Available | 1 |
| MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension | Sep 20, 2024 | cross-modal alignmentReferring Expression | CodeCode Available | 1 |
| Multi-branch Collaborative Learning Network for 3D Visual Grounding | Jul 7, 2024 | 3D visual groundingReferring Expression | CodeCode Available | 1 |
| LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension | Sep 18, 2024 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 1 |
| An Open and Comprehensive Pipeline for Unified Object Grounding and Detection | Jan 4, 2024 | Described Object DetectionPhrase Grounding | CodeCode Available | 1 |
| Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation | Mar 19, 2020 | Generalized Referring Expression ComprehensionReferring Expression | CodeCode Available | 1 |
| PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models | May 23, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Learning to Evaluate Performance of Multi-modal Semantic Localization | Sep 14, 2022 | Cross-Modal RetrievalReferring Expression | CodeCode Available | 1 |
| RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D | Aug 23, 2023 | ObjectObject Tracking | CodeCode Available | 1 |
| Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions | Nov 28, 2023 | DisentanglementReferring Expression | CodeCode Available | 1 |