| Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO | Jun 27, 2024 | Image SegmentationMedical Image Segmentation | —Unverified | 0 |
| ScanFormer: Referring Expression Comprehension by Iteratively Scanning | Jun 26, 2024 | InformativenessReferring Expression | —Unverified | 0 |
| Adversarial Robustness for Visual Grounding of Multimodal Large Language Models | May 16, 2024 | Adversarial AttackAdversarial Robustness | CodeCode Available | 0 |
| Text-driven Affordance Learning from Egocentric Vision | Apr 3, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| PropTest: Automatic Property Testing for Improved Visual Programming | Mar 25, 2024 | Question AnsweringReferring Expression | —Unverified | 0 |
| WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar | Mar 19, 2024 | Autonomous NavigationReferring Expression | —Unverified | 0 |
| Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training | Mar 4, 2024 | MathPhrase Grounding | —Unverified | 0 |
| Revisiting Counterfactual Problems in Referring Expression Comprehension | Jan 1, 2024 | AttributeContrastive Learning | CodeCode Available | 0 |
| Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction | Dec 21, 2023 | 16kAttribute | —Unverified | 0 |
| Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects | Dec 8, 2023 | Image Captioningobject-detection | —Unverified | 0 |