| Chain of Images for Intuitively Reasoning | Nov 9, 2023 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 1 |
| Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V | Oct 29, 2023 | DiagnosticLanguage Modeling | CodeCode Available | 1 |
| CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images | Oct 22, 2023 | DiagnosticLanguage Modeling | CodeCode Available | 1 |
| UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model | Oct 8, 2023 | DecoderLanguage Modeling | CodeCode Available | 1 |
| FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis | Jul 31, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Kosmos-2: Grounding Multimodal Large Language Models to the World | Jun 26, 2023 | Image CaptioningIn-Context Learning | CodeCode Available | 1 |
| LMEye: An Interactive Perception Network for Large Language Models | May 5, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model | Jul 15, 2025 | Keypoint DetectionLanguage Modeling | —Unverified | 0 |
| LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer | Jul 15, 2025 | DiagnosticLarge Language Model | —Unverified | 0 |
| MFGDiffusion: Mask-Guided Smoke Synthesis for Enhanced Forest Fire Detection | Jul 15, 2025 | Fire DetectionImage Generation | CodeCode Available | 0 |