A Comprehensive Dataset for German Offensive Language and Conversation Analysis

2022-07-01NAACL (WOAH) 2022Code Available1· sign in to hype

Christoph Demus, Jonas Pitz, Mina Schütz, Nadine Probol, Melanie Siegel, Dirk Labudde

Code Available — Be the first to reproduce this paper.

Code

github.com/hdasprachtechnologie/detox
OfficialIn papernone★ 19

Abstract

In this work, we present a new publicly available offensive language dataset of 10.278 German social media comments collected in the first half of 2021 that were annotated by in total six annotators. With twelve different annotation categories, it is far more comprehensive than other datasets, and goes beyond just hate speech detection. The labels aim in particular also at toxicity, criminal relevance and discrimination types of comments.Furthermore, about half of the comments are from coherent parts of conversations, which opens the possibility to consider the comments’ contexts and do conversation analyses in order to research the contagion of offensive language in conversations.

Tasks

Hate Speech Detection

A Comprehensive Dataset for German Offensive Language and Conversation Analysis

Code

Abstract

Tasks

Reproductions