SOTAVerified

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

2019-06-18NeurIPS 2019Code Available1· sign in to hype

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes. We design two novel methods to improve performance in such scenarios. First, we propose a theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound. This loss replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling. Second, we propose a simple, yet effective, training schedule that defers re-weighting until after the initial stage, allowing the model to learn an initial representation while avoiding some of the complications associated with re-weighting or re-sampling. We test our methods on several benchmark vision tasks including the real-world imbalanced dataset iNaturalist 2018. Our experiments show that either of these methods alone can already improve over existing techniques and their combination achieves even better performance gains.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
CIFAR-100-LT (ρ=10)LDAM-DRWError Rate41.29Unverified
CIFAR-100-LT (ρ=100)LDAM-DRWError Rate57.96Unverified
CIFAR-10-LT (ρ=10)Class-balanced ResamplingError Rate13.21Unverified
CIFAR-10-LT (ρ=10)Empirical Risk Minimization (ERM, CE)Error Rate13.61Unverified
CIFAR-10-LT (ρ=10)LDAM-DRWError Rate11.84Unverified
CIFAR-10-LT (ρ=100)LDAM-DRWError Rate22.97Unverified
COCO-MLTLDAM(ResNet-50)Average mAP40.53Unverified
VOC-MLTLDAM(ResNet-50)Average mAP70.73Unverified

Reproductions