| mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding | Jul 4, 2023 | document understandingLanguage Modeling | —Unverified | 0 |
| Kosmos-2: Grounding Multimodal Large Language Models to the World | Jun 26, 2023 | Image CaptioningIn-Context Learning | CodeCode Available | 1 |
| MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models | Jun 23, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| A Survey on Multimodal Large Language Models | Jun 23, 2023 | HallucinationIn-Context Learning | —Unverified | 0 |
| Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks | Jun 7, 2023 | Cross-Modal RetrievalLanguage Modelling | CodeCode Available | 2 |
| LMEye: An Interactive Perception Network for Large Language Models | May 5, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| Language Is Not All You Need: Aligning Perception with Language Models | Feb 27, 2023 | AllImage Captioning | —Unverified | 0 |