Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers

2023-05-28Code Available0· sign in to hype

Zahra Atashgahi, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu

Code Available — Be the first to reproduce this paper.

Code

github.com/zahraatashgahi/pals
OfficialIn paperpytorch★ 3

Abstract

Efficient time series forecasting has become critical for real-world applications, particularly with deep neural networks (DNNs). Efficiency in DNNs can be achieved through sparse connectivity and reducing the model size. However, finding the sparsity level automatically during training remains challenging due to the heterogeneity in the loss-sparsity tradeoffs across the datasets. In this paper, we propose Pruning with Adaptive Sparsity Level (PALS), to automatically seek a decent balance between loss and sparsity, all without the need for a predefined sparsity level. PALS draws inspiration from sparse training and during-training methods. It introduces the novel "expand" mechanism in training sparse neural networks, allowing the model to dynamically shrink, expand, or remain stable to find a proper sparsity level. In this paper, we focus on achieving efficiency in transformers known for their excellent time series forecasting performance but high computational cost. Nevertheless, PALS can be applied directly to any DNN. To this aim, we demonstrate its effectiveness also on the DLinear model. Experimental results on six benchmark datasets and five state-of-the-art (SOTA) transformer variants show that PALS substantially reduces model size while maintaining comparable performance to the dense model. More interestingly, PALS even outperforms the dense model, in blue12 and blue14 cases out of 30 cases in terms of MSE and MAE loss, respectively, while reducing blue65\% parameter count and blue63\% FLOPs on average. Our code and supplementary material are available on Github https://github.com/zahraatashgahi/PALS.

Tasks

Time Series Time Series Forecasting

Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers

Code

Abstract

Tasks

Reproductions