SOTAVerified

Composing Byte-Pair Encodings for Morphological Sequence Classification

2020-12-01UDW (COLING) 2020Code Available0· sign in to hype

Adam Ek, Jean-Philippe Bernardy

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Byte-pair encodings is a method for splitting a word into sub-word tokens, a language model then assigns contextual representations separately to each of these tokens. In this paper, we evaluate four different methods of composing such sub-word representations into word representations. We evaluate the methods on morphological sequence classification, the task of predicting grammatical features of a word. Our experiments reveal that using an RNN to compute word representations is consistently more effective than the other methods tested across a sample of eight languages with different typology and varying numbers of byte-pair tokens per word.

Tasks

Reproductions