SOTAVerified

Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian

2016-12-01WS 2016Unverified0· sign in to hype

Chenchen Ding, Masao Utiyama, Eiichiro Sumita

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank. The cross-lingual similarity is investigated and demonstrated on metrics of correspondence and order of tokens, based on several standard statistical machine translation techniques. The similarity shown in this study suggests a possibility on harmonious annotation and processing of the language pairs in future development.

Tasks

Reproductions