Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

2019-06-18NeurIPS 2019Code Available1· sign in to hype

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma

Code Available — Be the first to reproduce this paper.

Code

github.com/kaidic/LDAM-DRW
OfficialIn paperpytorch★ 0
github.com/j3soon/arxiv-utils
none★ 335
github.com/feidfoe/AdjustBnd4Imbalance
pytorch★ 19
github.com/jackhck/subclass-balancing-contrastive-learning
pytorch★ 18
github.com/orparask/VS-Loss
pytorch★ 17
github.com/ihaeyong/maximum-margin-ldam
pytorch★ 12
github.com/karurb92/ldam_str_bn
tf★ 0

Abstract

Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes. We design two novel methods to improve performance in such scenarios. First, we propose a theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound. This loss replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling. Second, we propose a simple, yet effective, training schedule that defers re-weighting until after the initial stage, allowing the model to learn an initial representation while avoiding some of the complications associated with re-weighting or re-sampling. We test our methods on several benchmark vision tasks including the real-world imbalanced dataset iNaturalist 2018. Our experiments show that either of these methods alone can already improve over existing techniques and their combination achieves even better performance gains.

Tasks

Long-tail Learning Long-tail learning with class descriptors

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CIFAR-100-LT (ρ=10)	LDAM-DRW	Error Rate	41.29	—	Unverified
CIFAR-100-LT (ρ=100)	LDAM-DRW	Error Rate	57.96	—	Unverified
CIFAR-10-LT (ρ=10)	Class-balanced Resampling	Error Rate	13.21	—	Unverified
CIFAR-10-LT (ρ=10)	Empirical Risk Minimization (ERM, CE)	Error Rate	13.61	—	Unverified
CIFAR-10-LT (ρ=10)	LDAM-DRW	Error Rate	11.84	—	Unverified
CIFAR-10-LT (ρ=100)	LDAM-DRW	Error Rate	22.97	—	Unverified
COCO-MLT	LDAM(ResNet-50)	Average mAP	40.53	—	Unverified
VOC-MLT	LDAM(ResNet-50)	Average mAP	70.73	—	Unverified

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

Code

Abstract

Tasks

Benchmark Results

Reproductions