Log-linear Models for Uyghur Segmentation in Spoken Language Translation

2017-09-01RANLP 2017Unverified0· sign in to hype

Chenggang Mi, Yating Yang, Rui Dong, Xi Zhou, Lei Wang, Xiao Li, Tonghai Jiang

Unverified — Be the first to reproduce this paper.

Abstract

To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random field (CRF) feature, bilingual word alignment feature and monolingual suffixword co-occurrence feature. Experimental results shown that our proposed segmentation model for Uyghur spoken translation achieved 1.6 BLEU score improvements compared with the state-of-the-art baseline.

Tasks

Machine Translation Segmentation Translation Word Alignment

Log-linear Models for Uyghur Segmentation in Spoken Language Translation

Abstract

Tasks

Reproductions