| An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding | Aug 2, 2024 | DecoderReasoning Segmentation | CodeCode Available | 1 |
| ViLLa: Video Reasoning Segmentation with Large Language Model | Jul 18, 2024 | Image SegmentationLanguage Modeling | CodeCode Available | 1 |
| CoReS: Orchestrating the Dance of Reasoning and Segmentation | Apr 8, 2024 | Reasoning SegmentationSegmentation | CodeCode Available | 1 |
| HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation | Jul 17, 2025 | Reasoning SegmentationWorld Knowledge | —Unverified | 0 |
| MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models | Jun 12, 2025 | Image SegmentationMedical Diagnosis | —Unverified | 0 |
| Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations | Jun 9, 2025 | Large Language ModelMultimodal Reasoning | —Unverified | 0 |
| RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought | Jun 4, 2025 | Multimodal ReasoningReasoning Segmentation | —Unverified | 0 |
| PixelThink: Towards Efficient Chain-of-Pixel Reasoning | May 29, 2025 | Reasoning Segmentationreinforcement-learning | —Unverified | 0 |
| Reasoning Segmentation for Images and Videos: A Survey | May 24, 2025 | Reasoning SegmentationSurvey | —Unverified | 0 |
| RVTBench: A Benchmark for Visual Reasoning Tasks | May 17, 2025 | Reasoning SegmentationVisual Question Answering (VQA) | CodeCode Available | 0 |