| New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration | Feb 27, 2025 | Image ComprehensionReferring Expression | CodeCode Available | 1 |
| ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter | May 12, 2023 | Image ComprehensionLanguage Modelling | CodeCode Available | 1 |
| RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension | Aug 3, 2023 | Image Comprehension | CodeCode Available | 1 |
| Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs | Jul 31, 2024 | HallucinationImage Comprehension | CodeCode Available | 1 |
| RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts | Dec 7, 2024 | Change DetectionImage Comprehension | CodeCode Available | 1 |
| FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension | Sep 23, 2024 | Image ComprehensionReferring Expression | CodeCode Available | 1 |
| EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Dec 12, 2024 | Image ComprehensionImage Generation | —Unverified | 0 |
| GeoLocator: a location-integrated large multimodal model for inferring geo-privacy | Nov 21, 2023 | Image Comprehension | —Unverified | 0 |
| Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation | Aug 1, 2024 | HallucinationImage Comprehension | —Unverified | 0 |
| CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs | May 30, 2025 | DiagnosticImage Comprehension | —Unverified | 0 |