Imputing typological values via phylogenetic inference
2020-11-01EMNLP (SIGTYP) 2020Code Available0· sign in to hype
Gerhard Jäger
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/gerhardjaeger/emnlp2020OfficialIn papernone★ 1
Abstract
This paper describes a workflow to impute missing values in a typological database, a sub- set of the World Atlas of Language Structures (WALS). Using a world-wide phylogeny de- rived from lexical data, the model assumes a phylogenetic continuous time Markov chain governing the evolution of typological val- ues. Data imputation is performed via a Max- imum Likelihood estimation on the basis of this model. As back-off model for languages whose phylogenetic position is unknown, a k- nearest neighbor classification based on geo- graphic distance is performed.