SOTAVerified

`BonTen' -- Corpus Concordance System for `NINJAL Web Japanese Corpus'

2016-12-01COLING 2016Unverified0· sign in to hype

Masayuki Asahara, Kazuya Kawahara, Yuya Takei, Hideto Masuoka, Yasuko Ohba, Yuki Torii, Toru Morii, Yuki Tanaka, Kikuo Maekawa, Sachi Kato, Hikari Konishi

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named `BonTen' which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure.

Tasks

Reproductions