SOTAVerified

TailMix: Overcoming the Label Sparsity for Extreme Multi-label Classification

2021-09-29Unverified0· sign in to hype

Sangwoo Han, Chan Lim, Jongwuk Lee

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Extreme multi-label classification (XMC) aims at finding the most relevant labels from a huge label set at the industrial scale. The XMC problem inherently poses two challenges: data scalability and label sparsity. This work introduces a new augmentation method, namely TailMix, to address the label sparsity issue, i.e., the long-tail labels in XMC have few positive instances. TailMix utilizes the context vector generated from the label attention layer in a label-wise manner instead of using the existing Mixup methods in a sample-wise manner. In this process, TailMix selectively chooses two context vectors and augments the most plausible positive instances to improve the accuracy for long-tail labels. Despite the simplicity of TailMix, extensive experimental results show that TailMix consistently improves the baseline models without TailMix and other Mixup-based methods on three benchmark datasets. Notably, TailMix is effective for improving the performance for long-tail labels on PSP@k and PSN@k, which are the common metrics that reflect the propensity of labels.

Tasks

Reproductions