DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation

2023-11-16Code Available1· sign in to hype

Yiqing Xie, Sheng Zhang, Hao Cheng, PengFei Liu, Zelalem Gero, Cliff Wong, Tristan Naumann, Hoifung Poon, Carolyn Rose

Code Available — Be the first to reproduce this paper.

Code

github.com/yiqingxyq/doclens
OfficialIn papernone★ 22
github.com/veronicium/doclens
OfficialIn papernone★ 22

Abstract

Medical text generation aims to assist with administrative work and highlight salient information to support decision-making. To reflect the specific requirements of medical text, in this paper, we propose a set of metrics to evaluate the completeness, conciseness, and attribution of the generated text at a fine-grained level. The metrics can be computed by various types of evaluators including instruction-following (both proprietary and open-source) and supervised entailment models. We demonstrate the effectiveness of the resulting framework, DocLens, with three evaluators on three tasks: clinical note generation, radiology report summarization, and patient question summarization. A comprehensive human study shows that DocLens exhibits substantially higher agreement with the judgments of medical experts than existing metrics. The results also highlight the need to improve open-source evaluators and suggest potential directions.

Tasks

Decision Making Instruction Following Text Generation

DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation

Code

Abstract

Tasks

Reproductions