75 Languages, 1 Model: Parsing Universal Dependencies Universally

2019-04-03IJCNLP 2019Code Available1· sign in to hype

Dan Kondratyuk, Milan Straka

Code Available — Be the first to reproduce this paper.

Code

github.com/hyperparticle/udify
OfficialIn paperpytorch★ 0
github.com/ahmetustun/udapter
pytorch★ 31

Abstract

We present UDify, a multilingual multi-task model capable of accurately predicting universal part-of-speech, morphological features, lemmas, and dependency trees simultaneously for all 124 Universal Dependencies treebanks across 75 languages. By leveraging a multilingual BERT self-attention model pretrained on 104 languages, we found that fine-tuning it on all datasets concatenated together with simple softmax classifiers for each UD task can result in state-of-the-art UPOS, UFeats, Lemmas, UAS, and LAS scores, without requiring any recurrent or language-specific components. We evaluate UDify for multilingual learning, showing that low-resource languages benefit the most from cross-linguistic annotations. We also evaluate for zero-shot learning, with results suggesting that multilingual training provides strong UD predictions even for languages that neither UDify nor BERT have ever been trained on. Code for UDify is available at https://github.com/hyperparticle/udify.

Tasks

Dependency Parsing model Zero-Shot Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
French GSD	UDify	LAS	91.45	—	Unverified
ParTUT	UDify	LAS	88.06	—	Unverified
Sequoia Treebank	UDify	LAS	90.05	—	Unverified
Spoken Corpus	UDify	LAS	80.01	—	Unverified
Universal Dependencies	UDify	LAS	80.43	—	Unverified

75 Languages, 1 Model: Parsing Universal Dependencies Universally

Code

Abstract

Tasks

Benchmark Results

Reproductions