SOTAVerified

EDTC: A Corpus for Discourse-Level Topic Chain Parsing

2021-11-01Findings (EMNLP) 2021Code Available0· sign in to hype

Longyin Zhang, Xin Tan, Fang Kong, Guodong Zhou

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Discourse analysis has long been known to be fundamental in natural language processing. In this research, we present our insight on discourse-level topic chain (DTC) parsing which aims at discovering new topics and investigating how these topics evolve over time within an article. To address the lack of data, we contribute a new discourse corpus with DTC-style dependency graphs annotated upon news articles. In particular, we ensure the high reliability of the corpus by utilizing a two-step annotation strategy to build the data and filtering out the annotations with low confidence scores. Based on the annotated corpus, we introduce a simple yet robust system for automatic discourse-level topic chain parsing.

Tasks

Reproductions