Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges

2025-03-11Code Available0· sign in to hype

Xiaoxiao Liu, Qingying Xiao, Junying Chen, Xiangyi Feng, Xiangbo Wu, Bairui Zhang, Xiang Wan, Jian Chang, Guangjun Yu, Yan Hu, Benyou Wang

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/FreedomIntelligence/IOR-Bench
Officialnone★ 1

Abstract

Large language models (LLMs) are increasingly applied to outpatient referral tasks across healthcare systems. However, there is a lack of standardized evaluation criteria to assess their effectiveness, particularly in dynamic, interactive scenarios. In this study, we systematically examine the capabilities and limitations of LLMs in managing tasks within Intelligent Outpatient Referral (IOR) systems and propose a comprehensive evaluation framework specifically designed for such systems. This framework comprises two core tasks: static evaluation, which focuses on evaluating the ability of predefined outpatient referrals, and dynamic evaluation, which evaluates capabilities of refining outpatient referral recommendations through iterative dialogues. Our findings suggest that LLMs offer limited advantages over BERT-like models, but show promise in asking effective questions during interactive dialogues.

Tasks

Benchmarking

Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges

Code

Abstract

Tasks

Reproductions