FNetAR: Mixing Tokens with Autoregressive Fourier Transforms

2021-07-22Code Available0· sign in to hype

Tim Lou, Michael Park, Mohammad Ramezanali, Vincent Tang

Code Available — Be the first to reproduce this paper.

Code

github.com/MindCode-4/code-3/tree/main/fnet
mindspore★ 0

Abstract

In this note we examine the autoregressive generalization of the FNet algorithm, in which self-attention layers from the standard Transformer architecture are substituted with a trivial sparse-uniformsampling procedure based on Fourier transforms. Using the Wikitext-103 benchmark, we demonstratethat FNetAR retains state-of-the-art performance (25.8 ppl) on the task of causal language modelingcompared to a Transformer-XL baseline (24.2 ppl) with only half the number self-attention layers,thus providing further evidence for the superfluity of deep neural networks with heavily compoundedattention mechanisms. The autoregressive Fourier transform could likely be used for parameterreduction on most Transformer-based time-series prediction models.

Tasks

Language Modelling Time Series Time Series Analysis Time Series Prediction

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
WikiText-103	FNetAR Medium	Test perplexity	25.81	—	Unverified

FNetAR: Mixing Tokens with Autoregressive Fourier Transforms

Code

Abstract

Tasks

Benchmark Results

Reproductions