| HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation | Jul 17, 2025 | Reasoning SegmentationWorld Knowledge | —Unverified | 0 |
| Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning | Jun 27, 2025 | Foreground Segmentationobject-detection | CodeCode Available | 2 |
| MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models | Jun 12, 2025 | Image SegmentationMedical Diagnosis | —Unverified | 0 |
| Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations | Jun 9, 2025 | Large Language ModelMultimodal Reasoning | —Unverified | 0 |
| OpenMaskDINO3D : Reasoning 3D Segmentation via Large Language Model | Jun 5, 2025 | Instance SegmentationLanguage Modeling | CodeCode Available | 1 |
| RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought | Jun 4, 2025 | Multimodal ReasoningReasoning Segmentation | —Unverified | 0 |
| PixelThink: Towards Efficient Chain-of-Pixel Reasoning | May 29, 2025 | Reasoning Segmentationreinforcement-learning | —Unverified | 0 |
| Reasoning Segmentation for Images and Videos: A Survey | May 24, 2025 | Reasoning SegmentationSurvey | —Unverified | 0 |
| RVTBench: A Benchmark for Visual Reasoning Tasks | May 17, 2025 | Reasoning SegmentationVisual Question Answering (VQA) | CodeCode Available | 0 |
| PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging | May 17, 2025 | Image SegmentationLanguage Modeling | —Unverified | 0 |
| VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning | May 17, 2025 | 2D Object DetectionObject Counting | CodeCode Available | 4 |
| LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery | May 5, 2025 | Reasoning SegmentationSegmentation | —Unverified | 0 |
| SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Apr 17, 2025 | Image GenerationLarge Language Model | CodeCode Available | 1 |
| LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation | Apr 15, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 |
| MediSee: Reasoning-based Pixel-level Perception in Medical Images | Apr 15, 2025 | Logical ReasoningReasoning Segmentation | —Unverified | 0 |
| Online Reasoning Video Segmentation with Just-in-Time Digital Twins | Mar 27, 2025 | Reasoning SegmentationVideo Segmentation | —Unverified | 0 |
| Operating Room Workflow Analysis via Reasoning Segmentation over Digital Twins | Mar 26, 2025 | Large Language ModelReasoning Segmentation | —Unverified | 0 |
| MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation | Mar 23, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation | Mar 18, 2025 | Reasoning SegmentationVideo Editing | —Unverified | 0 |
| MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation | Mar 18, 2025 | ObjectReasoning Segmentation | CodeCode Available | 1 |
| Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA | Mar 13, 2025 | Dataset GenerationReasoning Segmentation | —Unverified | 0 |
| Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts | Mar 10, 2025 | Reasoning SegmentationSegmentation | —Unverified | 0 |
| Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement | Mar 9, 2025 | Domain GeneralizationObject Detection | CodeCode Available | 4 |
| UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface | Mar 3, 2025 | Instance SegmentationReasoning Segmentation | CodeCode Available | 3 |
| Pixel-Level Reasoning Segmentation via Multi-turn Conversations | Feb 13, 2025 | Reasoning SegmentationSegmentation | CodeCode Available | 0 |
| The Devil is in Temporal Token: High Quality Video Reasoning Segmentation | Jan 15, 2025 | Reasoning SegmentationReferring Expression Segmentation | CodeCode Available | 2 |
| HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver | Jan 1, 2025 | Reasoning SegmentationSegmentation | CodeCode Available | 2 |
| POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation | Jan 1, 2025 | HallucinationReasoning Segmentation | —Unverified | 0 |
| PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation | Dec 19, 2024 | Reasoning Segmentation | —Unverified | 0 |
| InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models | Dec 18, 2024 | Reasoning SegmentationSegmentation | CodeCode Available | 2 |
| HyperSeg: Towards Universal Visual Segmentation with Large Language Model | Nov 26, 2024 | Language ModelingLarge Language Model | CodeCode Available | 2 |
| Multimodal 3D Reasoning Segmentation with Complex Scenes | Nov 21, 2024 | Reasoning SegmentationScene Understanding | —Unverified | 0 |
| Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Nov 15, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| SegLLM: Multi-round Reasoning Segmentation | Oct 24, 2024 | Reasoning SegmentationReferring Expression | —Unverified | 0 |
| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 |
| Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model | Sep 20, 2024 | Image CaptioningPanoptic Segmentation | CodeCode Available | 1 |
| Visual Agents as Fast and Slow Thinkers | Aug 16, 2024 | Question AnsweringReasoning Segmentation | CodeCode Available | 1 |
| One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning | Aug 6, 2024 | AllImage Captioning | —Unverified | 0 |
| An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding | Aug 2, 2024 | DecoderReasoning Segmentation | CodeCode Available | 1 |
| ViLLa: Video Reasoning Segmentation with Large Language Model | Jul 18, 2024 | Image SegmentationLanguage Modeling | CodeCode Available | 1 |
| VISA: Reasoning Video Object Segmentation via Large Language Models | Jul 16, 2024 | DecoderObject | CodeCode Available | 3 |
| Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models | May 29, 2024 | 3D Instance Segmentation3D Semantic Segmentation | —Unverified | 0 |
| Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | May 27, 2024 | DecoderLanguage Modeling | CodeCode Available | 2 |
| LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning | Apr 12, 2024 | Image SegmentationLanguage Modeling | CodeCode Available | 2 |
| CoReS: Orchestrating the Dance of Reasoning and Segmentation | Apr 8, 2024 | Reasoning SegmentationSegmentation | CodeCode Available | 1 |
| Empowering Segmentation Ability to Multi-modal Large Language Models | Mar 21, 2024 | Dialogue GenerationReasoning Segmentation | CodeCode Available | 0 |
| LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model | Dec 28, 2023 | Instance SegmentationLanguage Modeling | CodeCode Available | 4 |
| FoodLMM: A Versatile Food Assistant using Large Multi-modal Model | Dec 22, 2023 | Food RecognitionMulti-Task Learning | —Unverified | 0 |
| PixelLM: Pixel Reasoning with Large Multimodal Model | Dec 4, 2023 | Decodermodel | CodeCode Available | 2 |
| Beyond Segmentation: Road Network Generation with Multi-Modal LLMs | Oct 15, 2023 | Autonomous NavigationLanguage Modeling | —Unverified | 0 |