SALT : Sharing Attention between Linear layer and Transformer for tabular dataset

2021-09-29Code Available0· sign in to hype

Juseong Kim, Jinsun Park, Giltae Song

Code Available — Be the first to reproduce this paper.

Code

github.com/juseong03/salt
OfficialIn paperpytorch★ 0

Abstract

Handling tabular data with deep learning models is a challenging problem despite their remarkable success in vision and language processing applications. Therefore, many practitioners still rely on classical models such as gradient boosting decision trees (GBDTs) rather than deep networks due to their superior performance with tabular data. In this paper, we propose a novel hybrid deep network architecture for tabular data, dubbed SALT (Sharing Attention between Linear layer and Transformer). The proposed SALT consists of two blocks: Transformers and linear layers blocks that take advantage of shared attention matrices. The shared attention matrices enable transformers and linear layers to closely cooperate with each other, and it leads to improved performance and robustness. Our algorithm outperforms tree-based ensemble models and previous deep learning methods in multiple benchmark datasets. We further demonstrate the robustness of the proposed SALT with semi-supervised learning and pre-training with small dataset scenarios.

Tasks

Deep Learning

SALT : Sharing Attention between Linear layer and Transformer for tabular dataset

Code

Abstract

Tasks

Reproductions