Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Aug 24, 2023 Chart Question Answering FS-MEVQA
Code Code Available 5Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement Mar 9, 2025 Domain Generalization Object Detection
Code Code Available 4Multi-label Cluster Discrimination for Visual Representation Learning Jul 24, 2024 Contrastive Learning Image-text Retrieval
Code Code Available 4GLIPv2: Unifying Localization and Vision-Language Understanding Jun 12, 2022 2D Object Detection Contrastive Learning
Code Code Available 4VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning May 17, 2025 2D Object Detection Object Counting
Code Code Available 4Tracking Anything with Decoupled Video Segmentation Sep 7, 2023 Open-Vocabulary Video Segmentation Open-World Video Segmentation
Code Code Available 3PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model Mar 21, 2024 Decoder Generalized Referring Expression Segmentation
Code Code Available 3Generalized Decoding for Pixel, Image, and Language Dec 21, 2022 Decoder Image Segmentation
Code Code Available 3UniVS: Unified and Universal Video Segmentation with Prompts as Queries Feb 28, 2024 Decoder Referring Expression Segmentation
Code Code Available 3Universal Instance Perception as Object Discovery and Retrieval Mar 12, 2023 Described Object Detection Generalized Referring Expression Comprehension
Code Code Available 3EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Jun 28, 2024 Interactive Segmentation Language Modeling
Code Code Available 3General Object Foundation Model for Images and Videos at Scale Dec 14, 2023 Instance Segmentation Long-tail Video Object Segmentation
Code Code Available 3RemoteSAM: Towards Segment Anything for Earth Observation May 23, 2025 Attribute Earth Observation
Code Code Available 3GLaMM: Pixel Grounding Large Multimodal Model Nov 6, 2023 Conversational Question Answering Image Captioning
Code Code Available 2GRES: Generalized Referring Expression Segmentation Jun 1, 2023 Generalized Referring Expression Segmentation Referring Expression
Code Code Available 2Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation Jan 1, 2024 Descriptive Object
Code Code Available 2Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation Apr 4, 2024 Contrastive Learning Referring Expression
Code Code Available 2Language as Queries for Referring Video Object Segmentation Jan 3, 2022 Object Object Tracking
Code Code Available 2Unleashing Text-to-Image Diffusion Models for Visual Perception Mar 3, 2023 Denoising Depth Estimation
Code Code Available 2Hierarchical Open-vocabulary Universal Image Segmentation Jul 3, 2023 Image Comprehension Image Segmentation
Code Code Available 2Universal Segmentation at Arbitrary Granularity with Language Instruction Dec 4, 2023 Referring Expression Segmentation Segmentation
Code Code Available 2VLT: Vision-Language Transformer and Query Generation for Referring Segmentation Oct 28, 2022 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 2NExT-Chat: An LMM for Chat, Detection and Segmentation Nov 8, 2023 Referring Expression Referring Expression Segmentation
Code Code Available 2GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding Mar 13, 2025 Diversity Language Modeling
Code Code Available 2Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation Jan 15, 2025 Image Segmentation Referring Expression Segmentation
Code Code Available 2The Devil is in Temporal Token: High Quality Video Reasoning Segmentation Jan 15, 2025 Reasoning Segmentation Referring Expression Segmentation
Code Code Available 2Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic Jun 27, 2023 Image Captioning Referring Expression Segmentation
Code Code Available 2Text4Seg: Reimagining Image Segmentation as Text Generation Oct 13, 2024 Image Segmentation Referring Expression
Code Code Available 2SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Mar 11, 2025 Decision Making Interactive Segmentation
Code Code Available 2HyperSeg: Towards Universal Visual Segmentation with Large Language Model Nov 26, 2024 Language Modeling Large Language Model
Code Code Available 2SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation Sep 1, 2024 Language Modeling Language Modelling
Code Code Available 2UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces Dec 25, 2023 Image Segmentation Object
Code Code Available 2F-LMM: Grounding Frozen Large Multimodal Models Jun 9, 2024 General Knowledge Instruction Following
Code Code Available 2LAVT: Language-Aware Vision Transformer for Referring Image Segmentation Dec 4, 2021 Decoder Generalized Referring Expression Segmentation
Code Code Available 1End-to-End Referring Video Object Segmentation with Multimodal Transformers Nov 29, 2021 Inductive Bias Instance Segmentation
Code Code Available 13D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation Aug 31, 2023 Navigate Referring Expression
Code Code Available 1Local-Global Context Aware Transformer for Language-Guided Video Segmentation Mar 18, 2022 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models? Feb 6, 2025 Question Answering Referring Expression
Code Code Available 1PhraseCut: Language-based Image Segmentation in the Wild Aug 3, 2020 Attribute Diversity
Code Code Available 1PolyFormer: Referring Image Segmentation as Sequential Polygon Generation Feb 14, 2023 Decoder Image Segmentation
Code Code Available 1DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy Jul 2, 2025 Data Augmentation Generalized Referring Expression Segmentation
Code Code Available 1LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation Jun 14, 2023 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation Jul 18, 2023 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1Image Segmentation Using Text and Image Prompts Dec 18, 2021 Decoder Image Segmentation
Code Code Available 1Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus Jul 4, 2022 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation May 24, 2024 Generalized Referring Expression Segmentation Object
Code Code Available 1GSVA: Generalized Segmentation via Multimodal Large Language Models Dec 15, 2023 Decoder Generalized Referring Expression Segmentation
Code Code Available 1Advancing Referring Expression Segmentation Beyond Single Image May 21, 2023 Co-Salient Object Detection Object
Code Code Available 1Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation Jul 21, 2023 Decoder Image Segmentation
Code Code Available 1Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation Mar 19, 2020 Generalized Referring Expression Comprehension Referring Expression
Code Code Available 1