SOTAVerified

ThreatGram 101 - Extreme Telegram Replies Data with Threat Levels

2024-09-30Information Management and Big Data. SIMBig 2024. Communications in Computer and Information Science. Springer, Cham. 2024Code Available0· sign in to hype

Kamalakkannan Ravi, Jiann-Shiun Yuan

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

With the growth of social media, threats in comments targeting public officials, entities, or organizations have become increasingly common. Previous research on threat detection has typically focused on broad categories such as normal speech, hate speech, and offensive speech, lacking a more focused approach to identify calls for harm explicitly. To address this gap, we present a comprehensive dataset of user replies from Telegram channels associated with political extremism or the cyberbullying of public officials in the United States. Using keywords, we identified Telegram channels with extreme ideological leanings, high rates of grievances, and threatening language. We employed expert annotation to label a subset of replies from this dataset, creating a labeled set of 15,076 replies categorized as no threat, judicial threat, and non-judicial threat. This paper releases two datasets: 2 million unlabeled replies and 15,076 labeled replies from 17 Telegram channels. This dataset aims to enhance proactive monitoring and mitigation strategies for negative content, threats, and abusive language in social media comments. It provides a valuable resource for threat detection, political extremism analysis, countering violent extremism, and the study of cyberbullying dynamics on social media platforms, addressing current limitations in data diversity and enabling more effective responses to online threats.

Tasks

Reproductions