SOTAVerified

emLam -- a Hungarian Language Modeling baseline

2017-01-26Unverified0· sign in to hype

Dávid Márk Nemeskey

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three publicly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungar- ian benchmark corpus is introduced.

Tasks

Reproductions