SOTAVerified

DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation--Maximization and Chunk-based Language Model

2016-12-01WS 2016Unverified0· sign in to hype

Ond{\v{r}}ej Herman, V{\'\i}t Suchomel, V{\'\i}t Baisa, Pavel Rychl{\'y}

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper we investigate two approaches to discrimination of similar languages: Expectation--maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6\% and 88.3\% on set A of the DSL Shared task 2016 competition.

Tasks

Reproductions