| M^2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension | Jul 1, 2024 | GPUReferring Expression | —Unverified | 0 |
| Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO | Jun 27, 2024 | Image SegmentationMedical Image Segmentation | —Unverified | 0 |
| ScanFormer: Referring Expression Comprehension by Iteratively Scanning | Jun 26, 2024 | InformativenessReferring Expression | —Unverified | 0 |
| GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane | May 27, 2024 | 3DGSfeature selection | —Unverified | 0 |
| Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation | May 24, 2024 | DecoderGeneralized Referring Expression Segmentation | CodeCode Available | 0 |
| Adversarial Robustness for Visual Grounding of Multimodal Large Language Models | May 16, 2024 | Adversarial AttackAdversarial Robustness | CodeCode Available | 0 |
| Transcrib3D: 3D Referring Expression Resolution through Large Language Models | Apr 30, 2024 | Referring Expression | —Unverified | 0 |
| Resilience through Scene Context in Visual Referring Expression Generation | Apr 18, 2024 | Referring ExpressionReferring expression generation | CodeCode Available | 0 |
| Text-driven Affordance Learning from Egocentric Vision | Apr 3, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| SUGAR: Pre-training 3D Visual Representations for Robotics | Apr 1, 2024 | 3D Instance Segmentation3D Object Recognition | —Unverified | 0 |
| PropTest: Automatic Property Testing for Improved Visual Programming | Mar 25, 2024 | Question AnsweringReferring Expression | —Unverified | 0 |
| WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar | Mar 19, 2024 | Autonomous NavigationReferring Expression | —Unverified | 0 |
| Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training | Mar 4, 2024 | MathPhrase Grounding | —Unverified | 0 |
| Intrinsic Task-based Evaluation for Referring Expression Generation | Feb 12, 2024 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner | Feb 8, 2024 | Image SegmentationPseudo Label | —Unverified | 0 |
| Generalizable Entity Grounding via Assistance of Large Language Model | Feb 4, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Viewpoint-Aware Visual Grounding in 3D Scenes | Jan 1, 2024 | 3D visual groundingReferring Expression | —Unverified | 0 |
| Revisiting Counterfactual Problems in Referring Expression Comprehension | Jan 1, 2024 | AttributeContrastive Learning | CodeCode Available | 0 |
| Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction | Dec 21, 2023 | 16kAttribute | —Unverified | 0 |
| Localized Symbolic Knowledge Distillation for Visual Commonsense Models | Dec 8, 2023 | Image DescriptionInstruction Following | CodeCode Available | 0 |
| Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection | Dec 4, 2023 | Image to textobject-detection | —Unverified | 0 |
| InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation | Nov 30, 2023 | Image CaptioningReferring Expression | CodeCode Available | 0 |
| Continual Referring Expression Comprehension via Dual Modular Memorization | Nov 25, 2023 | MemorizationReferring Expression | CodeCode Available | 0 |
| Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models | Nov 24, 2023 | AllReferring Expression | CodeCode Available | 0 |
| Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models | Nov 21, 2023 | Image SegmentationLanguage Modelling | CodeCode Available | 0 |
| CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding | Nov 6, 2023 | CoLAQuestion Answering | —Unverified | 0 |
| Towards Omni-supervised Referring Expression Segmentation | Nov 1, 2023 | Referring ExpressionReferring Expression Segmentation | CodeCode Available | 0 |
| Text Augmented Spatial-aware Zero-shot Referring Image Segmentation | Oct 27, 2023 | Image SegmentationReferring Expression | —Unverified | 0 |
| Video Referring Expression Comprehension via Transformer with Content-conditioned Query | Oct 25, 2023 | cross-modal alignmentReferring Expression | —Unverified | 0 |
| Multi-modal Domain Adaptation for REG via Relation Transfer | Sep 23, 2023 | Domain Adaptationimage-classification | —Unverified | 0 |
| CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation | Sep 17, 2023 | DecoderReferring Expression | —Unverified | 0 |
| Collecting Visually-Grounded Dialogue with A Game Of Sorts | Sep 10, 2023 | Coreference ResolutionImage Retrieval | CodeCode Available | 0 |
| Whether you can locate or not? Interactive Referring Expression Generation | Aug 19, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| 'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges | Jul 28, 2023 | Referring Expression | CodeCode Available | 0 |
| Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks | Jul 14, 2023 | ObjectReferring Expression | —Unverified | 0 |
| Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input | Jun 25, 2023 | DiversityImage-text Retrieval | —Unverified | 0 |
| Referring Expression Comprehension Using Language Adaptive Inference | Jun 6, 2023 | object-detectionObject Detection | CodeCode Available | 0 |
| Language Adaptive Weight Generation for Multi-task Visual Grounding | Jun 6, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| DisCLIP: Open-Vocabulary Referring Expression Generation | May 30, 2023 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving | May 25, 2023 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples | May 24, 2023 | DiagnosticReferring Expression | CodeCode Available | 0 |
| Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive Teachers | May 22, 2023 | Referring Expression | CodeCode Available | 0 |
| Meta Compositional Referring Expression Segmentation | Apr 10, 2023 | Meta-LearningReferring Expression | —Unverified | 0 |
| CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension | Feb 17, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| Dynamic Inference With Grounding Based Vision and Language Models | Jan 1, 2023 | Language ModellingReferring Expression | —Unverified | 0 |
| Learning To Segment Every Referring Object Point by Point | Jan 1, 2023 | ObjectReferring Expression | CodeCode Available | 0 |
| RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension | Jan 1, 2023 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension | Jan 1, 2023 | Imitation LearningPseudo Label | —Unverified | 0 |
| Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning | Dec 17, 2022 | PositionReferring Expression | —Unverified | 0 |
| A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation | Nov 15, 2022 | Reference Expression GenerationReferring Expression | —Unverified | 0 |