| SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Apr 17, 2025 | Image GenerationLarge Language Model | CodeCode Available | 1 |
| ViLLa: Video Reasoning Segmentation with Large Language Model | Jul 18, 2024 | Image SegmentationLanguage Modeling | CodeCode Available | 1 |
| Visual Agents as Fast and Slow Thinkers | Aug 16, 2024 | Question AnsweringReasoning Segmentation | CodeCode Available | 1 |
| Multimodal 3D Reasoning Segmentation with Complex Scenes | Nov 21, 2024 | Reasoning SegmentationScene Understanding | —Unverified | 0 |
| Online Reasoning Video Segmentation with Just-in-Time Digital Twins | Mar 27, 2025 | Reasoning SegmentationVideo Segmentation | —Unverified | 0 |
| One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning | Aug 6, 2024 | AllImage Captioning | —Unverified | 0 |
| Operating Room Workflow Analysis via Reasoning Segmentation over Digital Twins | Mar 26, 2025 | Large Language ModelReasoning Segmentation | —Unverified | 0 |
| Transferring Foundation Models for Generalizable Robotic Manipulation | Jun 9, 2023 | Imitation LearningObject | —Unverified | 0 |
| VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation | Mar 18, 2025 | Reasoning SegmentationVideo Editing | —Unverified | 0 |
| Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Nov 15, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |