Star-Transformer

2019-02-25NAACL 2019Code Available0· sign in to hype

Qipeng Guo, Xipeng Qiu, PengFei Liu, Yunfan Shao, xiangyang xue, Zheng Zhang

Code Available — Be the first to reproduce this paper.

Code

github.com/fastnlp/fastNLP
In paperpytorch★ 0

Abstract

Although Transformer has achieved great successes on many NLP tasks, its heavy structure with fully-connected attention connections leads to dependencies on large training data. In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. To reduce model complexity, we replace the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node. Thus, complexity is reduced from quadratic to linear, while preserving capacity to capture both local composition and long-range dependency. The experiments on four tasks (22 datasets) show that Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.

Tasks

Named Entity Recognition (NER)Natural Language Inference Sentiment Analysis Text Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
SNLI	Star-Transformer (no cross sentence attention)	% Test Accuracy	86	—	Unverified

Star-Transformer

Code

Abstract

Tasks

Benchmark Results

Reproductions