| MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering | Apr 8, 2024 | BenchmarkingMedical Question Answering | —Unverified | 0 | 0 |
| Knowledge-guided Contextual Gene Set Analysis Using Large Language Models | Jun 4, 2025 | Benchmarking | —Unverified | 0 | 0 |
| MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine | May 12, 2023 | Benchmarking | —Unverified | 0 | 0 |
| MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models | May 16, 2025 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| MediaEval 2018: Predicting Media Memorability Task | Jul 3, 2018 | BenchmarkingMemorization | —Unverified | 0 | 0 |
| Benchmarking Large Language Models for Handwritten Text Recognition | Mar 19, 2025 | BenchmarkingHandwritten Text Recognition | —Unverified | 0 | 0 |
| MedMeshCNN -- Enabling MeshCNN for Medical Surface Models | Sep 10, 2020 | BenchmarkingSegmentation | —Unverified | 0 | 0 |
| Benchmarking large language models for materials synthesis: the case of atomic layer deposition | Dec 13, 2024 | BenchmarkingHallucination | —Unverified | 0 | 0 |
| Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents | Oct 1, 2024 | BenchmarkingConversational Question Answering | —Unverified | 0 | 0 |
| MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding | Jan 30, 2025 | BenchmarkingDecision Making | —Unverified | 0 | 0 |