SOTAVerified

HateBR: Large expert annotated corpus of Brazilian Instagram comments for abusive language detection

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Due to the severity of the social media abusive comments in Brazil, and the lack of research in Portuguese, this paper provides the first large-scale annotated corpus of Brazilian Instagram comments for hate speech and offensive language detection on the web and social media. The HateBR corpus was collected from Brazilian Instagram comments of political personalities and manually annotated by specialists, being composed of 7,000 documents annotated according to three different layers: a binary classification (offensive versus non-offensive comments), offense-level classes (highly, moderately, and slightly offensive messages), as well as nine hate speech targets (xenophobia, racism, homophobia, sexism, religious intolerance, partyism, apology to the dictatorship, antisemitism, and fatphobia). Each comment was annotated by three different annotators and achieved high inter-annotator agreement.

Tasks

Reproductions