| MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking | Apr 9, 2025 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 |
| Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image Generation | May 28, 2025 | Image GenerationLanguage Modeling | CodeCode Available | 0 |
| Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations | Oct 22, 2024 | Camouflaged Object SegmentationLarge Language Model | CodeCode Available | 0 |
| MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios | Dec 27, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 0 |
| Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities | Apr 2, 2025 | DescriptiveLarge Language Model | CodeCode Available | 0 |
| Layout Generation Agents with Large Language Models | May 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| TRINS: Towards Multimodal Language Models that Can Read | Jun 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding | Sep 10, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO | May 19, 2025 | DecoderImage Generation | CodeCode Available | 0 |
| Dynamic Pyramid Network for Efficient Multimodal Large Language Model | Mar 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |