| MiniCPM-V: A GPT-4V Level MLLM on Your Phone | Aug 3, 2024 | HallucinationMultiple-choice | CodeCode Available | 12 | 5 |
| HealthBench: Evaluating Large Language Models Towards Improved Human Health | May 13, 2025 | Instruction FollowingMultiple-choice | CodeCode Available | 7 | 5 |
| MMBench: Is Your Multi-modal Model an All-around Player? | Jul 12, 2023 | AllInstruction Following | CodeCode Available | 5 | 5 |
| LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks | Dec 19, 2024 | 8kIn-Context Learning | CodeCode Available | 5 | 5 |
| VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | Jun 11, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 5 | 5 |
| Flamingo: a Visual Language Model for Few-Shot Learning | Apr 29, 2022 | Few-Shot LearningGenerative Visual Question Answering | CodeCode Available | 4 | 5 |
| BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text | Mar 27, 2024 | ArticlesLanguage Modeling | CodeCode Available | 4 | 5 |
| BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | Jan 30, 2023 | Generative Visual Question AnsweringImage Captioning | CodeCode Available | 4 | 5 |
| I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench | Jan 31, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 4 | 5 |
| MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens | Apr 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 | 5 |