SOTAVerified

Imputing typological values via phylogenetic inference

2020-11-01EMNLP (SIGTYP) 2020Code Available0· sign in to hype

Gerhard Jäger

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper describes a workflow to impute missing values in a typological database, a sub- set of the World Atlas of Language Structures (WALS). Using a world-wide phylogeny de- rived from lexical data, the model assumes a phylogenetic continuous time Markov chain governing the evolution of typological val- ues. Data imputation is performed via a Max- imum Likelihood estimation on the basis of this model. As back-off model for languages whose phylogenetic position is unknown, a k- nearest neighbor classification based on geo- graphic distance is performed.

Tasks

Reproductions