A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/yuqinie98/patchtstOfficialIn paperpytorch★ 2,489
- github.com/timeseriesAI/tsaipytorch★ 6,019
- github.com/thuml/iTransformerpytorch★ 2,050
- github.com/WenjieDu/PyPOTSpytorch★ 1,970
- github.com/etna-team/etnapytorch★ 193
- github.com/romilbert/samformertf★ 191
- github.com/arclab-mit/sw-driver-forecasterpytorch★ 3
- github.com/MindCode-4/code-2/tree/main/patchtstmindspore★ 0
Abstract
We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. Patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. We also apply our model to self-supervised pre-training tasks and attain excellent fine-tuning performance, which outperforms supervised training on large datasets. Transferring of masked pre-trained representation on one dataset to others also produces SOTA forecasting accuracy. Code is available at: https://github.com/yuqinie98/PatchTST.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Electricity (192) | PatchTST/64 | MSE | 0.15 | — | Unverified |
| Electricity (336) | PatchTST/64 | MSE | 0.16 | — | Unverified |
| Electricity (720) | PatchTST/64 | MSE | 0.2 | — | Unverified |
| Electricity (96) | PatchTST/64 | MSE | 0.13 | — | Unverified |
| ETTh1 (192) Multivariate | PatchTST/64 | MSE | 0.41 | — | Unverified |
| ETTh1 (192) Univariate | PatchTST/64 | MSE | 0.07 | — | Unverified |
| ETTh1 (336) Multivariate | PatchTST/64 | MSE | 0.42 | — | Unverified |
| ETTh1 (336) Univariate | PatchTST/64 | MSE | 0.08 | — | Unverified |
| ETTh1 (720) Multivariate | PatchTST/64 | MSE | 0.45 | — | Unverified |
| ETTh1 (720) Univariate | PatchTST/64 | MSE | 0.09 | — | Unverified |
| ETTh1 (96) Multivariate | PatchTST/64 | MSE | 0.37 | — | Unverified |
| ETTh1 (96) Univariate | PatchTST/64 | MSE | 0.06 | — | Unverified |
| ETTh2 (192) Multivariate | PatchTST/64 | MSE | 0.34 | — | Unverified |
| ETTh2 (192) Univariate | PatchTST/64 | MSE | 0.17 | — | Unverified |
| ETTh2 (336) Multivariate | PatchTST/64 | MSE | 0.33 | — | Unverified |
| ETTh2 (336) Univariate | PatchTST/64 | MSE | 0.17 | — | Unverified |
| ETTh2 (720) Multivariate | PatchTST/64 | MSE | 0.38 | — | Unverified |
| ETTh2 (720) Univariate | PatchTST/64 | MSE | 0.22 | — | Unverified |
| ETTh2 (96) Multivariate | PatchTST/64 | MSE | 0.27 | — | Unverified |
| ETTh2 (96) Univariate | PatchTST/64 | MSE | 0.13 | — | Unverified |
| Weather (192) | PatchTST/64 | MSE | 0.19 | — | Unverified |
| Weather (336) | PatchTST/64 | MSE | 0.25 | — | Unverified |
| Weather (720) | PatchTST/64 | MSE | 0.31 | — | Unverified |
| Weather (96) | PatchTST/64 | MSE | 0.15 | — | Unverified |