| Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation | Oct 11, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 |
| Grounding Language in Multi-Perspective Referential Communication | Oct 4, 2024 | Referring ExpressionReferring expression generation | CodeCode Available | 0 |
| Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE | Sep 26, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension | Sep 23, 2024 | Image ComprehensionReferring Expression | CodeCode Available | 1 |
| Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation | Sep 20, 2024 | Image SegmentationReferring Expression | CodeCode Available | 1 |
| MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension | Sep 20, 2024 | cross-modal alignmentReferring Expression | CodeCode Available | 1 |
| LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension | Sep 18, 2024 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 1 |
| Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding | Sep 9, 2024 | Image RetrievalReferring Expression | CodeCode Available | 0 |
| Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression | Sep 5, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation | Sep 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection Training | Aug 20, 2024 | Autonomous VehiclesComputational Efficiency | CodeCode Available | 0 |
| Revisiting Multi-Modal LLM Evaluation | Aug 9, 2024 | Chart UnderstandingOptical Character Recognition | —Unverified | 0 |
| 3D-GRES: Generalized 3D Referring Expression Segmentation | Jul 30, 2024 | ObjectReferring Expression | CodeCode Available | 1 |
| MaskInversion: Localized Embeddings via Optimization of Explainability Maps | Jul 29, 2024 | Image GenerationReferring Expression | —Unverified | 0 |
| Look Hear: Gaze Prediction for Speech-directed Human Attention | Jul 28, 2024 | DecoderGaze Prediction | —Unverified | 0 |
| Learning Visual Grounding from Generative Vision and Language Model | Jul 18, 2024 | AttributeLanguage Modeling | —Unverified | 0 |
| Multi-branch Collaborative Learning Network for 3D Visual Grounding | Jul 7, 2024 | 3D visual groundingReferring Expression | CodeCode Available | 1 |
| The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge | Jul 6, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Referring Atomic Video Action Recognition | Jul 2, 2024 | Action LocalizationAction Recognition | CodeCode Available | 1 |
| SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation | Jul 2, 2024 | Referring ExpressionReferring Expression Segmentation | —Unverified | 0 |
| M^2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension | Jul 1, 2024 | GPUReferring Expression | —Unverified | 0 |
| EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model | Jun 28, 2024 | Interactive SegmentationLanguage Modeling | CodeCode Available | 3 |
| Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO | Jun 27, 2024 | Image SegmentationMedical Image Segmentation | —Unverified | 0 |
| ScanFormer: Referring Expression Comprehension by Iteratively Scanning | Jun 26, 2024 | InformativenessReferring Expression | —Unverified | 0 |
| Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models | Jun 24, 2024 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 2 |
| F-LMM: Grounding Frozen Large Multimodal Models | Jun 9, 2024 | General KnowledgeInstruction Following | CodeCode Available | 2 |
| SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation | Jun 3, 2024 | Pseudo LabelReferring Expression | CodeCode Available | 1 |
| GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane | May 27, 2024 | 3DGSfeature selection | —Unverified | 0 |
| Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation | May 24, 2024 | DecoderGeneralized Referring Expression Segmentation | CodeCode Available | 0 |
| CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation | May 24, 2024 | Generalized Referring Expression SegmentationObject | CodeCode Available | 1 |
| Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension | May 21, 2024 | 3D visual groundingReferring Expression | CodeCode Available | 1 |
| Adversarial Robustness for Visual Grounding of Multimodal Large Language Models | May 16, 2024 | Adversarial AttackAdversarial Robustness | CodeCode Available | 0 |
| Transcrib3D: 3D Referring Expression Resolution through Large Language Models | Apr 30, 2024 | Referring Expression | —Unverified | 0 |
| Resilience through Scene Context in Visual Referring Expression Generation | Apr 18, 2024 | Referring ExpressionReferring expression generation | CodeCode Available | 0 |
| Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation | Apr 4, 2024 | Contrastive LearningReferring Expression | CodeCode Available | 2 |
| Text-driven Affordance Learning from Egocentric Vision | Apr 3, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| SUGAR: Pre-training 3D Visual Representations for Robotics | Apr 1, 2024 | 3D Instance Segmentation3D Object Recognition | —Unverified | 0 |
| PropTest: Automatic Property Testing for Improved Visual Programming | Mar 25, 2024 | Question AnsweringReferring Expression | —Unverified | 0 |
| Elysium: Exploring Object-level Perception in Videos via MLLM | Mar 25, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model | Mar 21, 2024 | DecoderGeneralized Referring Expression Segmentation | CodeCode Available | 3 |
| WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar | Mar 19, 2024 | Autonomous NavigationReferring Expression | —Unverified | 0 |
| DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM | Mar 19, 2024 | Objectobject-detection | CodeCode Available | 1 |
| Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training | Mar 4, 2024 | MathPhrase Grounding | —Unverified | 0 |
| LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition | Feb 15, 2024 | Grounded Multimodal Named Entity RecognitionMulti-modal Named Entity Recognition | CodeCode Available | 1 |
| Intrinsic Task-based Evaluation for Referring Expression Generation | Feb 12, 2024 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner | Feb 8, 2024 | Image SegmentationPseudo Label | —Unverified | 0 |
| Generalizable Entity Grounding via Assistance of Large Language Model | Feb 4, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| An Open and Comprehensive Pipeline for Unified Object Grounding and Detection | Jan 4, 2024 | Described Object DetectionPhrase Grounding | CodeCode Available | 1 |
| Revisiting Counterfactual Problems in Referring Expression Comprehension | Jan 1, 2024 | AttributeContrastive Learning | CodeCode Available | 0 |