Document retrieval and question answering in medical documents. A large-scale corpus challenge.

2017-09-01RANLP 2017Unverified0· sign in to hype

Curea Eric

Unverified — Be the first to reproduce this paper.

Abstract

Whenever employed on large datasets, information retrieval works by isolating a subset of documents from the larger dataset and then proceeding with low-level processing of the text. This is usually carried out by means of adding index-terms to each document in the collection. In this paper we deal with automatic document classification and index-term detection applied on large-scale medical corpora. In our methodology we employ a linear classifier and we test our results on the BioASQ training corpora, which is a collection of 12 million MeSH-indexed medical abstracts. We cover both term-indexing, result retrieval and result ranking based on distributed word representations.

Tasks

Document Classification General Classification Information Retrieval Question Answering Retrieval

Document retrieval and question answering in medical documents. A large-scale corpus challenge.

Abstract

Tasks

Reproductions