Document retrieval and question answering in medical documents. A large-scale corpus challenge.
2017-09-01RANLP 2017Unverified0· sign in to hype
Curea Eric
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Whenever employed on large datasets, information retrieval works by isolating a subset of documents from the larger dataset and then proceeding with low-level processing of the text. This is usually carried out by means of adding index-terms to each document in the collection. In this paper we deal with automatic document classification and index-term detection applied on large-scale medical corpora. In our methodology we employ a linear classifier and we test our results on the BioASQ training corpora, which is a collection of 12 million MeSH-indexed medical abstracts. We cover both term-indexing, result retrieval and result ranking based on distributed word representations.