BCCWJ-DepPara: A Syntactic Annotation Treebank on the `Balanced Corpus of Contemporary Written Japanese'
2016-12-01WS 2016Unverified0· sign in to hype
Masayuki Asahara, Yuji Matsumoto
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Paratactic syntactic structures are difficult to represent in syntactic dependency tree structures. As such, we propose an annotation schema for syntactic dependency annotation of Japanese, in which coordinate structures are split from and overlaid on bunsetsu-based (base phrase unit) dependency. The schema represents nested coordinate structures, non-constituent conjuncts, and forward sharing as the set of regions. The annotation was performed on the core data of `Balanced Corpus of Contemporary Written Japanese', which comprised about one million words and 1980 samples from six registers, such as newspapers, books, magazines, and web texts.