Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking
Mingyu Lee, Jun-Hyung Park, Junho Kim, Kang-Min Kim, SangKeun Lee
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/koreamglee/concept-based-curriculum-maskingOfficialIn paperpytorch★ 13
Abstract
Masked language modeling (MLM) has been widely used for pre-training effective bidirectional representations, but incurs substantial training costs. In this paper, we propose a novel concept-based curriculum masking (CCM) method to efficiently pre-train a language model. CCM has two key differences from existing curriculum learning approaches to effectively reflect the nature of MLM. First, we introduce a carefully-designed linguistic difficulty criterion that evaluates the MLM difficulty of each token. Second, we construct a curriculum that gradually masks words related to the previously masked words by retrieving a knowledge graph. Experimental results show that CCM significantly improves pre-training efficiency. Specifically, the model trained with CCM shows comparative performance with the original BERT on the General Language Understanding Evaluation benchmark at half of the training cost.