SOTAVerified

Converting the Sinica Treebank of Mandarin Chinese to Universal Dependencies

2022-06-01LREC (LAW) 2022Code Available0· sign in to hype

Yu-Ming Hsieh, Yueh-Yin Shih, Wei-Yun Ma

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper describes the conversion of the Sinica Treebank, one of the major Mandarin Chinese treebanks, to Universal Dependencies. The conversion is rule-based and the process involves POS tag mapping, head adjusting in line with the UD scheme and the dependency conversion. Linguistic insights into Mandarin Chinese alongwith the conversion are also discussed. The resulting corpus is the UD Chinese Sinica Treebank which contains more than fifty thousand tree structures according to the UD scheme. The dataset can be downloaded at https://github.com/ckiplab/ud.

Tasks

Reproductions