SOTAVerified

Pinyin-bert: A new solution to Chinese pinyin to character conversion task

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Pinyin to Character conversion (P2C) task is the key task of Input Method Engine (IME) in commercial input software for Asian languages, such as Chinese, Japanese, Thai language, and so on. The dominant technique is Ngram language model together with smoothing technique. However, Ngram model's low capacity limits its performance. Under the trend of deep learning, this paper choose the powerful bert network architecture and propose Pinyin-bert to solve the P2C task, which achieves substantial performance improvement from Ngram model. Furthermore, we combine Pinyin-bert with Ngram model under Markov model's framework and improve performance further. Lastly, we design a way to incorporate external lexicon into Pinyin-bert so as to adapt to the out of domain.

Tasks

Reproductions