| DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding | Nov 28, 2022 | object-detectionObject Detection | CodeCode Available | 1 | 5 |
| MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding | Apr 26, 2021 | Generalized Referring Expression ComprehensionPhrase Grounding | CodeCode Available | 1 | 5 |
| An Open and Comprehensive Pipeline for Unified Object Grounding and Detection | Jan 4, 2024 | Described Object DetectionPhrase Grounding | CodeCode Available | 1 | 5 |
| LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension | Sep 18, 2024 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 1 | 5 |
| Talk2Car: Taking Control of Your Self-Driving Car | Sep 24, 2019 | Autonomous DrivingObject | CodeCode Available | 1 | 5 |
| LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition | Feb 15, 2024 | Grounded Multimodal Named Entity RecognitionMulti-modal Named Entity Recognition | CodeCode Available | 1 | 5 |
| Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds | Dec 16, 2021 | Objectobject-detection | CodeCode Available | 1 | 5 |
| Learning to Evaluate Performance of Multi-modal Semantic Localization | Sep 14, 2022 | Cross-Modal RetrievalReferring Expression | CodeCode Available | 1 | 5 |
| TransVG: End-to-End Visual Grounding with Transformers | Apr 17, 2021 | Referring Expression ComprehensionVisual Grounding | CodeCode Available | 1 | 5 |
| Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions | Nov 28, 2023 | DisentanglementReferring Expression | CodeCode Available | 1 | 5 |