| VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning | May 17, 2025 | 2D Object DetectionObject Counting | CodeCode Available | 4 |
| Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement | Mar 9, 2025 | Domain GeneralizationObject Detection | CodeCode Available | 4 |
| LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model | Dec 28, 2023 | Instance SegmentationLanguage Modeling | CodeCode Available | 4 |
| LISA: Reasoning Segmentation via Large Language Model | Aug 1, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface | Mar 3, 2025 | Instance SegmentationReasoning Segmentation | CodeCode Available | 3 |
| VISA: Reasoning Video Object Segmentation via Large Language Models | Jul 16, 2024 | DecoderObject | CodeCode Available | 3 |
| Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning | Jun 27, 2025 | Foreground Segmentationobject-detection | CodeCode Available | 2 |
| The Devil is in Temporal Token: High Quality Video Reasoning Segmentation | Jan 15, 2025 | Reasoning SegmentationReferring Expression Segmentation | CodeCode Available | 2 |
| HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver | Jan 1, 2025 | Reasoning SegmentationSegmentation | CodeCode Available | 2 |
| InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models | Dec 18, 2024 | Reasoning SegmentationSegmentation | CodeCode Available | 2 |
| HyperSeg: Towards Universal Visual Segmentation with Large Language Model | Nov 26, 2024 | Language ModelingLarge Language Model | CodeCode Available | 2 |
| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 |
| Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | May 27, 2024 | DecoderLanguage Modeling | CodeCode Available | 2 |
| LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning | Apr 12, 2024 | Image SegmentationLanguage Modeling | CodeCode Available | 2 |
| PixelLM: Pixel Reasoning with Large Multimodal Model | Dec 4, 2023 | Decodermodel | CodeCode Available | 2 |
| OpenMaskDINO3D : Reasoning 3D Segmentation via Large Language Model | Jun 5, 2025 | Instance SegmentationLanguage Modeling | CodeCode Available | 1 |
| SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Apr 17, 2025 | Image GenerationLarge Language Model | CodeCode Available | 1 |
| MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation | Mar 18, 2025 | ObjectReasoning Segmentation | CodeCode Available | 1 |
| Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model | Sep 20, 2024 | Image CaptioningPanoptic Segmentation | CodeCode Available | 1 |
| Visual Agents as Fast and Slow Thinkers | Aug 16, 2024 | Question AnsweringReasoning Segmentation | CodeCode Available | 1 |
| An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding | Aug 2, 2024 | DecoderReasoning Segmentation | CodeCode Available | 1 |
| ViLLa: Video Reasoning Segmentation with Large Language Model | Jul 18, 2024 | Image SegmentationLanguage Modeling | CodeCode Available | 1 |
| CoReS: Orchestrating the Dance of Reasoning and Segmentation | Apr 8, 2024 | Reasoning SegmentationSegmentation | CodeCode Available | 1 |
| HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation | Jul 17, 2025 | Reasoning SegmentationWorld Knowledge | —Unverified | 0 |
| MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models | Jun 12, 2025 | Image SegmentationMedical Diagnosis | —Unverified | 0 |