Transformers are Universal Predictors

2023-07-15Unverified0· sign in to hype

Sourya Basu, Moulik Choraria, Lav R. Varshney

Unverified — Be the first to reproduce this paper.

Abstract

We find limits to the Transformer architecture for language modeling and show it has a universal prediction property in an information-theoretic sense. We further analyze performance in non-asymptotic data regimes to understand the role of various components of the Transformer architecture, especially in the context of data-efficient training. We validate our theoretical analysis with experiments on both synthetic and real datasets.

Tasks

Language Modeling Language Modelling

Transformers are Universal Predictors

Abstract

Tasks

Reproductions