Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Aug 24, 2023 Chart Question Answering FS-MEVQA
Code Code Available 5VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning May 17, 2025 2D Object Detection Object Counting
Code Code Available 4Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement Mar 9, 2025 Domain Generalization Object Detection
Code Code Available 4Multi-label Cluster Discrimination for Visual Representation Learning Jul 24, 2024 Contrastive Learning Image-text Retrieval
Code Code Available 4GLIPv2: Unifying Localization and Vision-Language Understanding Jun 12, 2022 2D Object Detection Contrastive Learning
Code Code Available 4RemoteSAM: Towards Segment Anything for Earth Observation May 23, 2025 Attribute Earth Observation
Code Code Available 3EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Jun 28, 2024 Interactive Segmentation Language Modeling
Code Code Available 3PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model Mar 21, 2024 Decoder Generalized Referring Expression Segmentation
Code Code Available 3UniVS: Unified and Universal Video Segmentation with Prompts as Queries Feb 28, 2024 Decoder Referring Expression Segmentation
Code Code Available 3General Object Foundation Model for Images and Videos at Scale Dec 14, 2023 Instance Segmentation Long-tail Video Object Segmentation
Code Code Available 3Tracking Anything with Decoupled Video Segmentation Sep 7, 2023 Open-Vocabulary Video Segmentation Open-World Video Segmentation
Code Code Available 3Universal Instance Perception as Object Discovery and Retrieval Mar 12, 2023 Described Object Detection Generalized Referring Expression Comprehension
Code Code Available 3Generalized Decoding for Pixel, Image, and Language Dec 21, 2022 Decoder Image Segmentation
Code Code Available 3GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding Mar 13, 2025 Diversity Language Modeling
Code Code Available 2SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Mar 11, 2025 Decision Making Interactive Segmentation
Code Code Available 2The Devil is in Temporal Token: High Quality Video Reasoning Segmentation Jan 15, 2025 Reasoning Segmentation Referring Expression Segmentation
Code Code Available 2Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation Jan 15, 2025 Image Segmentation Referring Expression Segmentation
Code Code Available 2HyperSeg: Towards Universal Visual Segmentation with Large Language Model Nov 26, 2024 Language Modeling Large Language Model
Code Code Available 2Text4Seg: Reimagining Image Segmentation as Text Generation Oct 13, 2024 Image Segmentation Referring Expression
Code Code Available 2SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation Sep 1, 2024 Language Modeling Language Modelling
Code Code Available 2F-LMM: Grounding Frozen Large Multimodal Models Jun 9, 2024 General Knowledge Instruction Following
Code Code Available 2Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation Apr 4, 2024 Contrastive Learning Referring Expression
Code Code Available 2Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation Jan 1, 2024 Descriptive Object
Code Code Available 2UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces Dec 25, 2023 Image Segmentation Object
Code Code Available 2Universal Segmentation at Arbitrary Granularity with Language Instruction Dec 4, 2023 Referring Expression Segmentation Segmentation
Code Code Available 2NExT-Chat: An LMM for Chat, Detection and Segmentation Nov 8, 2023 Referring Expression Referring Expression Segmentation
Code Code Available 2GLaMM: Pixel Grounding Large Multimodal Model Nov 6, 2023 Conversational Question Answering Image Captioning
Code Code Available 2Hierarchical Open-vocabulary Universal Image Segmentation Jul 3, 2023 Image Comprehension Image Segmentation
Code Code Available 2Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic Jun 27, 2023 Image Captioning Referring Expression Segmentation
Code Code Available 2GRES: Generalized Referring Expression Segmentation Jun 1, 2023 Generalized Referring Expression Segmentation Referring Expression
Code Code Available 2Unleashing Text-to-Image Diffusion Models for Visual Perception Mar 3, 2023 Denoising Depth Estimation
Code Code Available 2VLT: Vision-Language Transformer and Query Generation for Referring Segmentation Oct 28, 2022 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 2Language as Queries for Referring Video Object Segmentation Jan 3, 2022 Object Object Tracking
Code Code Available 2DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy Jul 2, 2025 Data Augmentation Generalized Referring Expression Segmentation
Code Code Available 1PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models? Feb 6, 2025 Question Answering Referring Expression
Code Code Available 1MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation Jan 23, 2025 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints Jan 12, 2025 Image Segmentation Referring Expression
Code Code Available 1IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation Jan 9, 2025 Decoder Referring Expression
Code Code Available 1RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation Dec 3, 2024 Referring Expression Referring Expression Segmentation
Code Code Available 1MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation Nov 28, 2024 Data Augmentation Image Segmentation
Code Code Available 13D-GRES: Generalized 3D Referring Expression Segmentation Jul 30, 2024 Object Referring Expression
Code Code Available 1ViLLa: Video Reasoning Segmentation with Large Language Model Jul 18, 2024 Image Segmentation Language Modeling
Code Code Available 1SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation Jun 3, 2024 Pseudo Label Referring Expression
Code Code Available 1CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation May 24, 2024 Generalized Referring Expression Segmentation Object
Code Code Available 1Temporally Consistent Referring Video Object Segmentation with Hybrid Memory Mar 28, 2024 HTR Object
Code Code Available 1Mask Grounding for Referring Image Segmentation Dec 19, 2023 cross-modal alignment Image Segmentation
Code Code Available 1GSVA: Generalized Segmentation via Multimodal Large Language Models Dec 15, 2023 Decoder Generalized Referring Expression Segmentation
Code Code Available 1Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation Dec 13, 2023 Descriptive Object
Code Code Available 1EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment Dec 13, 2023 Decoder Depth Estimation
Code Code Available 13D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation Aug 31, 2023 Navigate Referring Expression
Code Code Available 1