SOTAVerified

Challenges to Open-Domain Constituency Parsing

2022-05-01Findings (ACL) 2022Code Available1· sign in to hype

Sen yang, Leyang Cui, Ruoxi Ning, Di wu, Yue Zhang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Neural constituency parsers have reached practical performance on news-domain benchmarks. However, their generalization ability to other domains remains weak. Existing findings on cross-domain constituency parsing are only made on a limited number of domains. Tracking this, we manually annotate a high-quality constituency treebank containing five domains. We analyze challenges to open-domain constituency parsing using a set of linguistic features on various strong constituency parsers. Primarily, we find that 1) BERT significantly increases parsers’ cross-domain performance by reducing their sensitivity on the domain-variant features.2) Compared with single metrics such as unigram distribution and OOV rate, challenges to open-domain constituency parsing arise from complex features, including cross-domain lexical and constituent structure variations.

Tasks

Reproductions