| SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Apr 17, 2025 | Image GenerationLarge Language Model | CodeCode Available | 1 | 5 |
| ViLLa: Video Reasoning Segmentation with Large Language Model | Jul 18, 2024 | Image SegmentationLanguage Modeling | CodeCode Available | 1 | 5 |
| Visual Agents as Fast and Slow Thinkers | Aug 16, 2024 | Question AnsweringReasoning Segmentation | CodeCode Available | 1 | 5 |
| Empowering Segmentation Ability to Multi-modal Large Language Models | Mar 21, 2024 | Dialogue GenerationReasoning Segmentation | CodeCode Available | 0 | 5 |
| Pixel-Level Reasoning Segmentation via Multi-turn Conversations | Feb 13, 2025 | Reasoning SegmentationSegmentation | CodeCode Available | 0 | 5 |
| RVTBench: A Benchmark for Visual Reasoning Tasks | May 17, 2025 | Reasoning SegmentationVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| Operating Room Workflow Analysis via Reasoning Segmentation over Digital Twins | Mar 26, 2025 | Large Language ModelReasoning Segmentation | —Unverified | 0 | 0 |
| Transferring Foundation Models for Generalizable Robotic Manipulation | Jun 9, 2023 | Imitation LearningObject | —Unverified | 0 | 0 |
| LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation | Apr 15, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| PixelThink: Towards Efficient Chain-of-Pixel Reasoning | May 29, 2025 | Reasoning Segmentationreinforcement-learning | —Unverified | 0 | 0 |