| Performance of ChatGPT-3.5 and GPT-4 on the United States Medical Licensing Examination With and Without Distractions | Sep 12, 2023 | Multiple-choiceSentence | —Unverified | 0 |
| INCEPTNET: Precise And Early Disease Detection Application For Medical Images Analyses | Sep 5, 2023 | Cell DetectionLesion Segmentation | CodeCode Available | 0 |
| An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models | Sep 5, 2023 | Multiple-choice | —Unverified | 0 |
| Generalised Winograd Schema and its Contextuality | Aug 31, 2023 | coreference-resolutionCoreference Resolution | —Unverified | 0 |
| Spoken Language Intelligence of Large Language Models for Language Learning | Aug 28, 2023 | Language AcquisitionMultiple-choice | CodeCode Available | 0 |
| Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions | Aug 22, 2023 | Multiple-choiceSensitivity | —Unverified | 0 |
| A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology | Aug 9, 2023 | Multiple-choice | —Unverified | 0 |
| Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning | Aug 7, 2023 | In-Context LearningMath | CodeCode Available | 0 |
| ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval | Aug 4, 2023 | BenchmarkingInformation Retrieval | CodeCode Available | 0 |
| ReCoMIF: Reading comprehension based multi-source information fusion network for Chinese spoken language understanding | Aug 1, 2023 | Intent DetectionMultiple-choice | CodeCode Available | 0 |