| A Foundational Multimodal Vision Language AI Assistant for Human Pathology | Dec 13, 2023 | Decision MakingDiagnostic | —Unverified | 0 | 0 |
| PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian | Feb 11, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| Performance of ChatGPT-3.5 and GPT-4 on the United States Medical Licensing Examination With and Without Distractions | Sep 12, 2023 | Multiple-choiceSentence | —Unverified | 0 | 0 |
| Performance of leading large language models in May 2025 in Membership of the Royal College of General Practitioners-style examination questions: a cross-sectional analysis | Jun 3, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| PersianMedQA: Language-Centric Evaluation of LLMs in the Persian Medical Domain | May 30, 2025 | Instruction FollowingMultiple-choice | —Unverified | 0 | 0 |
| Personalised Feedback Framework for Online Education Programmes Using Generative AI | Oct 14, 2024 | BenchmarkingManagement | —Unverified | 0 | 0 |
| PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models | Jun 21, 2025 | Mathematical ReasoningMultiple-choice | —Unverified | 0 | 0 |
| Vision-Language Models Do Not Understand Negation | Jan 16, 2025 | Multiple-choiceNegation | —Unverified | 0 | 0 |
| Predicting Item Survival for Multiple Choice Questions in a High-Stakes Medical Exam | May 1, 2020 | Information RetrievalMultiple-choice | —Unverified | 0 | 0 |
| Predicting the Difficulty and Response Time of Multiple Choice Questions Using Transfer Learning | Jul 1, 2020 | Multiple-choiceTransfer Learning | —Unverified | 0 | 0 |