SOTAVerified

RAC-BERT: Character Radical Enhanced BERT for Ancient Chinese

2023-10-08journal 2023Code Available0· sign in to hype

Lifan Han, Xin Wang, Meng Wang, Zhao Li, Heyi Zhang, Zirui Chen, Xiaowang Zhang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In recent years, Chinese pre-training language models have achieved significant improvements in the fields, such as natural language understanding (NLU) and text generation. However, most of these existing pre-trained language models focus on modern Chinese but ignore the rich semantic information embedded for Chinese characters, especially the radical information. To this end, we present RAC-BERT, a language-specific BERT model for ancient Chinese. Specifically, we propose two new radical-based pre-training tasks, which are: (1) replacing the masked tokens with random words of the same radical, that can mitigate the gap between the pre-training and fine-tuning stages; (2) predicting the radical of the masked token, not the original word, that reduces the computational effort. Extensive experiments were conducted on two ancient Chinese NLP datasets. The results show that our model significantly outperforms the state-of-the-art models on most tasks. And we conducted ablation experiments to demonstrate the effectiveness of our approach. The pre-trained model are publicly available at https://github.com/CubeHan/RAC-BERT Access provided by Huawei Technologi

Tasks

Reproductions