SOTAVerified

How to Use less Features and Reach Better Performance in Author Gender Identification

2014-05-01LREC 2014Unverified0· sign in to hype

Juan Soler Company, Leo Wanner

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Over the last years, author profiling in general and author gender identification in particular have become a popular research area due to their potential attractive applications that range from forensic investigations to online marketing studies. However, nearly all state-of-the-art works in the area still very much depend on the datasets they were trained and tested on, since they heavily draw on content features, mostly a large number of recurrent words or combinations of words extracted from the training sets. We show that using a small number of features that mainly depend on the structure of the texts we can outperform other approaches that depend mainly on the content of the texts and that use a huge number of features in the process of identifying if the author of a text is a man or a woman. Our system has been tested against a dataset constructed for our work as well as against two datasets that were previously used in other papers.

Tasks

Reproductions