SsciBERT: A Pre-trained Language Model for Social Science Texts

2022-06-09Code Available1· sign in to hype

Si Shen, Jiangfeng Liu, Litao Lin, Ying Huang, Lin Zhang, Chang Liu, Yutong Feng, Dongbo Wang

Code Available — Be the first to reproduce this paper.

Code

github.com/s-t-full-text-knowledge-mining/ssci-bert
OfficialIn paperpytorch★ 38

Abstract

The academic literature of social sciences records human civilization and studies human social problems. With its large-scale growth, the ways to quickly find existing research on relevant issues have become an urgent demand for researchers. Previous studies, such as SciBERT, have shown that pre-training using domain-specific texts can improve the performance of natural language processing tasks. However, the pre-trained language model for social sciences is not available so far. In light of this, the present research proposes a pre-trained model based on the abstracts published in the Social Science Citation Index (SSCI) journals. The models, which are available on GitHub (https://github.com/S-T-Full-Text-Knowledge-Mining/SSCI-BERT), show excellent performance on discipline classification, abstract structure-function recognition, and named entity recognition tasks with the social sciences literature.

Tasks

Language Modeling Language Modelling named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)

SsciBERT: A Pre-trained Language Model for Social Science Texts

Code

Abstract

Tasks

Reproductions