A Large-scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts

2021-03-22Code Available1· sign in to hype

Son T. Luu, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Code Available — Be the first to reproduce this paper.

Code

github.com/sonlam1102/vihsd-vietnamese-hate-speech-detection-dataset
Officialnone★ 28
github.com/sonlam1102/vihsd
none★ 28

Abstract

In recent years, Vietnam witnesses the mass development of social network users on different social platforms such as Facebook, Youtube, Instagram, and Tiktok. On social medias, hate speech has become a critical problem for social network users. To solve this problem, we introduce the ViHSD - a human-annotated dataset for automatically detecting hate speech on the social network. This dataset contains over 30,000 comments, each comment in the dataset has one of three labels: CLEAN, OFFENSIVE, or HATE. Besides, we introduce the data creation process for annotating and evaluating the quality of the dataset. Finally, we evaluated the dataset by deep learning models and transformer models.

Tasks

Hate Speech Detection Vietnamese Hate Speech Detection Vietnamese Social Media Text Processing

A Large-scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts

Code

Abstract

Tasks

Reproductions