| RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension | Aug 3, 2023 | Image Comprehension | CodeCode Available | 1 | 5 |
| ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter | May 12, 2023 | Image ComprehensionLanguage Modelling | CodeCode Available | 1 | 5 |
| Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs | Jul 31, 2024 | HallucinationImage Comprehension | CodeCode Available | 1 | 5 |
| FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension | Sep 23, 2024 | Image ComprehensionReferring Expression | CodeCode Available | 1 | 5 |
| RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts | Dec 7, 2024 | Change DetectionImage Comprehension | CodeCode Available | 1 | 5 |
| New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration | Feb 27, 2025 | Image ComprehensionReferring Expression | CodeCode Available | 1 | 5 |
| MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification | Apr 7, 2024 | Image ComprehensionMath | CodeCode Available | 0 | 5 |
| InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition | Sep 26, 2023 | ArticlesImage Comprehension | CodeCode Available | 0 | 5 |
| FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion | Oct 16, 2024 | ArticlesImage Comprehension | CodeCode Available | 0 | 5 |
| CLIC: Contrastive Learning Framework for Unsupervised Image Complexity Representation | Nov 19, 2024 | AttributeContrastive Learning | CodeCode Available | 0 | 5 |