HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

2020-05-28ACL 2020Code Available1· sign in to hype

Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, Song Han

Code Available — Be the first to reproduce this paper.

Code

github.com/mit-han-lab/hardware-aware-transformers
OfficialIn paperpytorch★ 337
github.com/Luccadoremi/Model-Compression-DAQ
pytorch★ 4
github.com/aaditkapoor/PDFExtract
none★ 1
github.com/mlatsjsu/PDFextract
none★ 0

Abstract

Transformers are ubiquitous in Natural Language Processing (NLP) tasks, but they are difficult to be deployed on hardware due to the intensive computation. To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search. We first construct a large design space with arbitrary encoder-decoder attention and heterogeneous layers. Then we train a SuperTransformer that covers all candidates in the design space, and efficiently produces many SubTransformers with weight sharing. Finally, we perform an evolutionary search with a hardware latency constraint to find a specialized SubTransformer dedicated to run fast on the target hardware. Extensive experiments on four machine translation tasks demonstrate that HAT can discover efficient models for different hardware (CPU, GPU, IoT device). When running WMT'14 translation task on Raspberry Pi-4, HAT can achieve 3 speedup, 3.7 smaller size over baseline Transformer; 2.7 speedup, 3.6 smaller size over Evolved Transformer with 12,041 less search cost and no performance loss. HAT code is https://github.com/mit-han-lab/hardware-aware-transformers.git

Tasks

CPU Decoder GPU Machine Translation Neural Architecture Search Raspberry Pi 4 Translation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
WMT2014 English-French	Hardware Aware Transformer	BLEU score	41.8	—	Unverified
WMT2014 English-German	Hardware Aware Transformer	BLEU score	28.4	—	Unverified

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Code

Abstract

Tasks

Benchmark Results

Reproductions