SOTAVerified

Towards Summarizing Healthcare Questions in Low-Resource Setting

2022-10-01COLING 2022Unverified0· sign in to hype

Shweta Yadav, Cornelia Caragea

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The current advancement in abstractive document summarization depends to a large extent on a considerable amount of human-annotated datasets. However, the creation of large-scale datasets is often not feasible in closed domains, such as medical and healthcare domains, where human annotation requires domain expertise. This paper presents a novel data selection strategy to generate diverse and semantic questions in a low-resource setting with the aim to summarize healthcare questions. Our method exploits the concept of guided semantic-overlap and diversity-based objective functions to optimally select the informative and diverse set of synthetic samples for data augmentation. Our extensive experiments on benchmark healthcare question summarization datasets demonstrate the effectiveness of our proposed data selection strategy by achieving new state-of-the-art results. Our human evaluation shows that our method generates diverse, fluent, and informative summarized questions.

Tasks

Reproductions