| Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | May 27, 2024 | DecoderLanguage Modeling | CodeCode Available | 2 | 5 |
| PixelLM: Pixel Reasoning with Large Multimodal Model | Dec 4, 2023 | Decodermodel | CodeCode Available | 2 | 5 |
| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 | 5 |
| LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning | Apr 12, 2024 | Image SegmentationLanguage Modeling | CodeCode Available | 2 | 5 |
| The Devil is in Temporal Token: High Quality Video Reasoning Segmentation | Jan 15, 2025 | Reasoning SegmentationReferring Expression Segmentation | CodeCode Available | 2 | 5 |
| Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model | Sep 20, 2024 | Image CaptioningPanoptic Segmentation | CodeCode Available | 1 | 5 |
| CoReS: Orchestrating the Dance of Reasoning and Segmentation | Apr 8, 2024 | Reasoning SegmentationSegmentation | CodeCode Available | 1 | 5 |
| An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding | Aug 2, 2024 | DecoderReasoning Segmentation | CodeCode Available | 1 | 5 |
| OpenMaskDINO3D : Reasoning 3D Segmentation via Large Language Model | Jun 5, 2025 | Instance SegmentationLanguage Modeling | CodeCode Available | 1 | 5 |
| SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Apr 17, 2025 | Image GenerationLarge Language Model | CodeCode Available | 1 | 5 |