Composing Byte-Pair Encodings for Morphological Sequence Classification
2020-12-01UDW (COLING) 2020Code Available0· sign in to hype
Adam Ek, Jean-Philippe Bernardy
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/adamlek/ud-morphological-taggingOfficialIn paperpytorch★ 1
Abstract
Byte-pair encodings is a method for splitting a word into sub-word tokens, a language model then assigns contextual representations separately to each of these tokens. In this paper, we evaluate four different methods of composing such sub-word representations into word representations. We evaluate the methods on morphological sequence classification, the task of predicting grammatical features of a word. Our experiments reveal that using an RNN to compute word representations is consistently more effective than the other methods tested across a sample of eight languages with different typology and varying numbers of byte-pair tokens per word.