DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation--Maximization and Chunk-based Language Model
2016-12-01WS 2016Unverified0· sign in to hype
Ond{\v{r}}ej Herman, V{\'\i}t Suchomel, V{\'\i}t Baisa, Pavel Rychl{\'y}
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In this paper we investigate two approaches to discrimination of similar languages: Expectation--maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6\% and 88.3\% on set A of the DSL Shared task 2016 competition.