| Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V | Oct 29, 2023 | DiagnosticLanguage Modeling | CodeCode Available | 1 |
| CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images | Oct 22, 2023 | DiagnosticLanguage Modeling | CodeCode Available | 1 |
| Multimodal Large Language Model for Visual Navigation | Oct 12, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Ferret: Refer and Ground Anything Anywhere at Any Granularity | Oct 11, 2023 | HallucinationLanguage Modeling | CodeCode Available | 5 |
| UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model | Oct 8, 2023 | DecoderLanguage Modeling | CodeCode Available | 1 |
| Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips | Oct 1, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Investigating the Catastrophic Forgetting in Multimodal Large Language Models | Sep 19, 2023 | image-classificationImage Classification | —Unverified | 0 |
| Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems | Aug 20, 2023 | Emotion RecognitionLanguage Modelling | —Unverified | 0 |
| FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis | Jul 31, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning | Jul 18, 2023 | Instruction FollowingLanguage Modeling | —Unverified | 0 |