Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian
2016-12-01WS 2016Unverified0· sign in to hype
Chenchen Ding, Masao Utiyama, Eiichiro Sumita
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank. The cross-lingual similarity is investigated and demonstrated on metrics of correspondence and order of tokens, based on several standard statistical machine translation techniques. The similarity shown in this study suggests a possibility on harmonious annotation and processing of the language pairs in future development.