Variation in Coreference Strategies across Genres and Production Media
Berfin Akta{\c{s}}, Manfred Stede
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/berfingit/coreference-variationOfficialIn papernone★ 0
Abstract
In response to (i) inconclusive results in the literature as to the properties of coreference chains in written versus spoken language, and (ii) a general lack of work on automatic coreference resolution on both spoken language and social media, we undertake a corpus study involving the various genre sections of Ontonotes, the Switchboard corpus, and a corpus of Twitter conversations. Using a set of measures that previously have been applied individually to different data sets, we find fairly clear patterns of ``behavior'' for the different genres/media. Besides their role for psycholinguistic investigation (why do we employ different coreference strategies when we write or speak) and for the placement of Twitter in the spoken--written continuum, we see our results as a contribution to approaching genre-/media-specific coreference resolution.