Momresp: A Bayesian Model for Multi-Annotator Document Labeling

2014-05-01LREC 2014Unverified0· sign in to hype

Paul Felt, Robbie Haertel, Eric Ringger, Kevin Seppi

Unverified — Be the first to reproduce this paper.

Abstract

Data annotation in modern practice often involves multiple, imperfect human annotators. Multiple annotations can be used to infer estimates of the ground-truth labels and to estimate individual annotator error characteristics (or reliability). We introduce MomResp, a model that incorporates information from both natural data clusters as well as annotations from multiple annotators to infer ground-truth labels and annotator reliability for the document classification task. We implement this model and show dramatic improvements over majority vote in situations where both annotations are scarce and annotation quality is low as well as in situations where annotators disagree consistently. Because MomResp predictions are subject to label switching, we introduce a solution that finds nearly optimal predicted class reassignments in a variety of settings using only information available to the model at inference time. Although MomResp does not perform well in annotation-rich situations, we show evidence suggesting how this shortcoming may be overcome in future work.

Tasks

Document Classification

Momresp: A Bayesian Model for Multi-Annotator Document Labeling

Abstract

Tasks

Reproductions