Authorship Attribution Using a Neural Network Language Model

2016-02-17Code Available0· sign in to hype

Zhenhao Ge, Yufang Sun, Mark J. T. Smith

Code Available — Be the first to reproduce this paper.

Code

github.com/zge/authorship-attribution
OfficialIn papernone★ 0

Abstract

In practice, training language models for individual authors is often expensive because of limited data resources. In such cases, Neural Network Language Models (NNLMs), generally outperform the traditional non-parametric N-gram models. Here we investigate the performance of a feed-forward NNLM on an authorship attribution problem, with moderate author set size and relatively limited data. We also consider how the text topics impact performance. Compared with a well-constructed N-gram baseline method with Kneser-Ney smoothing, the proposed method achieves nearly 2:5% reduction in perplexity and increases author classification accuracy by 3:43% on average, given as few as 5 test sentences. The performance is very competitive with the state of the art in terms of accuracy and demand on test data. The source code, preprocessed datasets, a detailed description of the methodology and results are available at https://github.com/zge/authorship-attribution.

Tasks

Authorship Attribution Language Modeling Language Modelling model

Authorship Attribution Using a Neural Network Language Model

Code

Abstract

Tasks

Reproductions