Sub-character Neural Language Modelling in Japanese

2017-09-01WS 2017Unverified0· sign in to hype

Viet Nguyen, Julian Brooke, Timothy Baldwin

Unverified — Be the first to reproduce this paper.

Abstract

In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements. This paper examines the effect of using sub-characters for language modeling in Japanese. This is achieved by decomposing characters according to a range of character decomposition datasets, and training a neural language model over variously decomposed character representations. Our results indicate that language modelling can be improved through the inclusion of sub-characters, though this result depends on a good choice of decomposition dataset and the appropriate granularity of decomposition.

Tasks

Language Modeling Language Modelling

Sub-character Neural Language Modelling in Japanese

Abstract

Tasks

Reproductions