Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Cluster for Extreme Multi-label Text Classification

2020-01-01ICML 2020Unverified0· sign in to hype

Hui Ye, Zhiyu Chen, Da-Han Wang, Brian Davison

Unverified — Be the first to reproduce this paper.

Abstract

Extreme multi-label text classification (XMTC) is a task for tagging a given text with the most relevant labels from an extremely large label set. We propose a novel deep learning method called APLC-XLNet. Our approach fine-tunes the recently released generalized autoregressive pretraining model (XLNet) to learn the dense representation for the input text. We propose the Adaptive Probabilistic Label Cluster (APLC) to approximate the cross entropy loss by exploiting the unbalanced label distribution to form clusters that explicitly reduce the computational time. Our experiments, carried out on five benchmark datasets, show that our approach significantly outperforms existing state-of-the-art methods. The code of our method will be released publicly at GitHub.

Tasks

Multi Label Text Classification Multi-Label Text Classification text-classification Text Classification

Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Cluster for Extreme Multi-label Text Classification

Abstract

Tasks

Reproductions