emrQA: A Large Corpus for Question Answering on Electronic Medical Records

2018-09-03EMNLP 2018Code Available1· sign in to hype

Anusri Pampari, Preethi Raghavan, Jennifer Liang, Jian Peng

Code Available — Be the first to reproduce this paper.

Code

github.com/panushri25/emrQA
OfficialIn papernone★ 153
github.com/xiangyue9607/CliniRC
tf★ 18
github.com/YIKUAN8/Clinical-Longformer
none★ 0

Abstract

We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets. The resulting corpus (emrQA) has 1 million question-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping.

Tasks

Form Question Answering

emrQA: A Large Corpus for Question Answering on Electronic Medical Records

Code

Abstract

Tasks

Reproductions