On Position Embeddings in BERT

2021-01-01ICLR 2021Unverified0· sign in to hype

Benyou Wang, Lifeng Shang, Christina Lioma, Xin Jiang, Hao Yang, Qun Liu, Jakob Grue Simonsen

Unverified — Be the first to reproduce this paper.

Abstract

Various Position Embeddings (PEs) have been proposed in Transformer based architectures~(e.g. BERT) to model word order. These are empirically-driven and perform well, but no formal framework exists to systematically study them. To address this, we present three expected properties of PEs that capture word distance in vector space: translation invariance, monotonicity, and symmetry. These properties formally capture the behaviour of PEs and allow us to reinterpret sinusoidal PEs in a principled way. An empirical evaluation of seven PEs (and their combinations) for classification and span prediction shows that fully-learnable absolute PEs perform better in classification, while relative PEs perform better in span prediction. We contribute the first formal analysis of desired properties for PEs and principled discussion to its connection to typical downstream tasks.

Tasks

General Classification Position Translation

On Position Embeddings in BERT

Abstract

Tasks

Reproductions