SkoltechNLP at SemEval-2021 Task 5: Leveraging Sentence-level Pre-training for Toxic Span Detection

2021-08-01SEMEVALUnverified0· sign in to hype

David Dale, Igor Markov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

Unverified — Be the first to reproduce this paper.

Abstract

This work describes the participation of the Skoltech NLP group team (Sk) in the Toxic Spans Detection task at SemEval-2021. The goal of the task is to identify the most toxic fragments of a given sentence, which is a binary sequence tagging problem. We show that fine-tuning a RoBERTa model for this problem is a strong baseline. This baseline can be further improved by pre-training the RoBERTa model on a large dataset labeled for toxicity at the sentence level. While our solution scored among the top 20\% participating models, it is only 2 points below the best result. This suggests the viability of our approach.

Tasks

Sentence Toxic Spans Detection

SkoltechNLP at SemEval-2021 Task 5: Leveraging Sentence-level Pre-training for Toxic Span Detection

Abstract

Tasks

Reproductions