| MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework | Oct 2, 2024 | BenchmarkingInstruction Following | CodeCode Available | 1 |
| O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning | Jan 11, 2025 | Decision MakingDiagnostic | CodeCode Available | 1 |
| Large Language Models Encode Clinical Knowledge | Dec 26, 2022 | Clinical KnowledgeMedQA | CodeCode Available | 1 |
| MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports | May 16, 2025 | DiagnosticMath | CodeCode Available | 1 |
| Kformer: Knowledge Injection in Transformer Feed-Forward Layers | Jan 15, 2022 | Language ModellingMedical Question Answering | CodeCode Available | 1 |
| Can large language models reason about medical questions? | Jul 17, 2022 | MedQAMultiple-choice | CodeCode Available | 1 |
| FiTs: Fine-grained Two-stage Training for Knowledge-aware Question Answering | Feb 23, 2023 | Knowledge GraphsMedical Question Answering | CodeCode Available | 1 |
| Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks | May 28, 2023 | MedQAMemorization | CodeCode Available | 1 |
| Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding | May 19, 2023 | GPULanguage Modeling | CodeCode Available | 1 |
| MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning | Jun 3, 2024 | DiagnosticMedQA | CodeCode Available | 1 |