SOTAVerified

BioReddit: Word Embeddings for User-Generated Biomedical NLP

2019-11-01WS 2019Unverified0· sign in to hype

Marco Basaldella, Nigel Collier

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Word embeddings, in their different shapes and iterations, have changed the natural language processing research landscape in the last years. The biomedical text processing field is no stranger to this revolution; however, scholars in the field largely trained their embeddings on scientific documents only, even when working on user-generated data. In this paper we show how training embeddings from a corpus collected from user-generated text from medical forums heavily influences the performance on downstream tasks, outperforming embeddings trained both on general purpose data or on scientific papers when applied on user-generated content.

Tasks

Reproductions