| R2MED: A Benchmark for Reasoning-Driven Medical Retrieval | May 20, 2025 | DiagnosticRe-Ranking | CodeCode Available | 1 |
| DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models | May 20, 2025 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports | May 16, 2025 | DiagnosticMath | CodeCode Available | 1 |
| Phare: A Safety Probe for Large Language Models | May 16, 2025 | DiagnosticHallucination | CodeCode Available | 1 |
| Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform | May 14, 2025 | DiagnosticSensitivity | CodeCode Available | 1 |
| Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models | May 11, 2025 | DescriptiveDiagnostic | CodeCode Available | 1 |
| MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks | May 9, 2025 | DiagnosticInstruction Following | CodeCode Available | 1 |
| VideoPath-LLaVA: Pathology Diagnostic Reasoning Through Video Instruction Tuning | May 7, 2025 | Decision MakingDiagnostic | CodeCode Available | 1 |
| UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation | Apr 30, 2025 | DiagnosticLarge Language Model | CodeCode Available | 1 |
| ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification | Apr 29, 2025 | DiagnosticQuestion Answering | CodeCode Available | 1 |