SOTAVerified

ezCoref: A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution

2022-01-16ACL ARR January 2022Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Large-scale high-quality corpora are critical for advancing research in coreference resolution. Coreference annotation is typically time consuming and expensive, since researchers generally hire expert annotators and train them with an extensive set of guidelines. Crowdsourcing is a promising alternative, but coreference includes complex semantic phenomena difficult to explain to untrained crowdworkers, and the clustering structure is difficult to manipulate in a user interface. To address these challenges, we develop and release ezCoref, an easy-to-use coreference annotation tool, and annotation methodology that facilitates crowdsourced data collection across multiple domains, currently in English. Instead of teaching crowdworkers how to handle non-trivial cases (e.g., near-identity coreferences), ezCoref provides only a minimal set of guidelines sufficient for understanding the basics of the task. To validate this decision, we deploy ezCoref on Mechanical Turk to re-annotate 240 passages from seven existing English coreference datasets across seven domains, achieving an average rate of 2530 tokens per hour, for one annotator. This paper is the first to compare the quality of crowdsourced coreference annotations against those of experts, and to identify where their behavior differs to facilitate future annotation efforts. We show that it is possible to collect coreference annotations of a reasonable quality in a fraction of time it would traditionally require.

Tasks

Reproductions