| InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition | Sep 26, 2023 | ArticlesImage Comprehension | CodeCode Available | 0 | 5 |
| MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification | Apr 7, 2024 | Image ComprehensionMath | CodeCode Available | 0 | 5 |
| MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval | Nov 13, 2024 | Image ComprehensionInformation Retrieval | CodeCode Available | 0 | 5 |
| Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens | Sep 15, 2023 | Image ComprehensionLanguage Modeling | —Unverified | 0 | 0 |
| Unveiling Glitches: A Deep Dive into Image Encoding Bugs within CLIP | Jun 30, 2024 | HallucinationImage Comprehension | —Unverified | 0 | 0 |
| What Large Language Models Bring to Text-rich VQA? | Nov 13, 2023 | Image ComprehensionOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Multiplane Prior Guided Few-Shot Aerial Scene Rendering | Jun 7, 2024 | Image ComprehensionNeRF | —Unverified | 0 | 0 |
| An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension | Aug 1, 2020 | Decoderglobal-optimization | —Unverified | 0 | 0 |
| Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension | Nov 9, 2024 | Image ComprehensionLanguage Modeling | —Unverified | 0 | 0 |
| GeoLocator: a location-integrated large multimodal model for inferring geo-privacy | Nov 21, 2023 | Image Comprehension | —Unverified | 0 | 0 |