SOTAVerified

A Dependency Treebank of the Chinese Buddhist Canon

2016-05-01LREC 2016Unverified0· sign in to hype

Tak-sum Wong, John Lee

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We present a dependency treebank of the Chinese Buddhist Canon, which contains 1,514 texts with about 50 million Chinese characters. The treebank was created by an automatic parser trained on a smaller treebank, containing four manually annotated sutras (Lee and Kong, 2014). We report results on word segmentation, part-of-speech tagging and dependency parsing, and discuss challenges posed by the processing of medieval Chinese. In a case study, we exploit the treebank to examine verbs frequently associated with Buddha, and to analyze usage patterns of quotative verbs in direct speech. Our results suggest that certain quotative verbs imply status differences between the speaker and the listener.

Tasks

Reproductions