SRPOL DIALOGUE SYSTEMS at SemEval-2021 Task 5: Automatic Generation of Training Data for Toxic Spans Detection

2021-08-01SEMEVALUnverified0· sign in to hype

Micha{\l} Sat{\l}awa, Katarzyna Zam{\l}y{\'n}ska, Jaros{\l}aw Piersa, Joanna Kolis, Klaudia Firl{\k{a}}g, Katarzyna Beksa, Zuzanna Bordzicka, Christian Goltz, Pawe{\l} Bujnowski, Piotr Andruszkiewicz

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper presents a system used for SemEval-2021 Task 5: Toxic Spans Detection. Our system is an ensemble of BERT-based models for binary word classification, trained on a dataset extended by toxic comments modified and generated by two language models. For the toxic word classification, the prediction threshold value was optimized separately for every comment, in order to maximize the expected F1 value.

Tasks

Classification Toxic Spans Detection

SRPOL DIALOGUE SYSTEMS at SemEval-2021 Task 5: Automatic Generation of Training Data for Toxic Spans Detection

Abstract

Tasks

Reproductions