Abusive language in Spanish children and young teenager's conversations: data preparation and short text classification with contextual word embeddings

2020-05-01LREC 2020Unverified0· sign in to hype

Marta R. Costa-juss{\`a}, Esther Gonz{\'a}lez, Asuncion Moreno, Eudald Cumalat

Unverified — Be the first to reproduce this paper.

Abstract

Abusive texts are reaching the interests of the scientific and social community. How to automatically detect them is onequestion that is gaining interest in the natural language processing community. The main contribution of this paper is toevaluate the quality of the recently developed ''Spanish Database for cyberbullying prevention'' for the purpose of trainingclassifiers on detecting abusive short texts. We compare classical machine learning techniques to the use of a more ad-vanced model: the contextual word embeddings in the particular case of classification of abusive short-texts for the Spanishlanguage. As contextual word embeddings, we use Bidirectional Encoder Representation from Transformers (BERT), pro-posed at the end of 2018. We show that BERT mostly outperforms classical techniques. Far beyond the experimentalimpact of our research, this project aims at planting the seeds for an innovative technological tool with a high potentialsocial impact and aiming at being part of the initiatives in artificial intelligence for social good.

Tasks

Abusive Language text-classification Text Classification Word Embeddings

Abusive language in Spanish children and young teenager's conversations: data preparation and short text classification with contextual word embeddings

Abstract

Tasks

Reproductions