Equipping Educational Applications with Domain Knowledge

2019-08-01WS 2019Unverified0· sign in to hype

Tarek Sakakini, Hongyu Gong, Jong Yoon Lee, Robert Schloss, JinJun Xiong, Suma Bhat

Unverified — Be the first to reproduce this paper.

Abstract

One of the challenges of building natural language processing (NLP) applications for education is finding a large domain-specific corpus for the subject of interest (e.g., history or science). To address this challenge, we propose a tool, Dexter, that extracts a subject-specific corpus from a heterogeneous corpus, such as Wikipedia, by relying on a small seed corpus and distributed document representations. We empirically show the impact of the generated corpus on language modeling, estimating word embeddings, and consequently, distractor generation, resulting in better performances than while using a general domain corpus, a heuristically constructed domain-specific corpus, and a corpus generated by a popular system: BootCaT.

Tasks

Distractor Generation Language Modeling Language Modelling Word Embeddings

Equipping Educational Applications with Domain Knowledge

Abstract

Tasks

Reproductions