Hate Speech Detection

Hate speech detection is the task of detecting if communication such as text, audio, and so on contains hatred and or encourages violence towards a person or a group of people. This is usually based on prejudice against 'protected characteristics' such as their ethnicity, gender, sexual orientation, religion, age et al. Some example benchmarks are ETHOS and HateXplain. Models can be evaluated with metrics like the F-score or F-measure.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 507 papers

Title	Date	Tasks	Status
Fine-Grained Chinese Hate Speech Understanding: Span-Level Resources, Coded Term Lexicon, and Enhanced Detection Frameworks	Jul 15, 2025	Hate Speech Detection	—Unverified
Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?	Jun 15, 2025	Hate Speech DetectionTransliteration	—Unverified
Towards Fairness Assessment of Dutch Hate Speech Detection	Jun 14, 2025	counterfactualFairness	—Unverified
ToxSyn-PT: A Large-Scale Synthetic Dataset for Hate Speech Detection in Portuguese	Jun 11, 2025	Hate Speech DetectionMulti-Label Classification	—Unverified
Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models	Jun 10, 2025	FairnessHate Speech Detection	—Unverified
Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models	Jun 9, 2025	Hate Speech Detection	—Unverified
Cracking the Code: Enhancing Implicit Hate Speech Detection through Coding Classification	Jun 5, 2025	Hate Speech Detection	—Unverified
On Fairness of Task Arithmetic: The Role of Task Vectors	May 30, 2025	FairnessHate Speech Detection	—Unverified
AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection	May 26, 2025	Contrastive LearningHate Speech Detection	CodeCode Available
Social Good or Scientific Curiosity? Uncovering the Research Framing Behind NLP Artefacts	May 24, 2025	Fact CheckingHate Speech Detection	—Unverified

Show:10 25 50

← PrevPage 1 of 51Next →

All datasets Ethos Binary HateXplain Ethos MultiLabel Waseem et al., 2018 AbusEval Automatic Misogynistic Identification HateMM HatEval OffensEval 2019 ToLD-Br bajer_danish_misogyny DKhate

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	BERT-MRP	AUROC	0.86	—	Unverified
2	BERT-RP	AUROC	0.85	—	Unverified
3	BERT-HateXplain [Attn]	AUROC	0.85	—	Unverified
4	BERT-HateXplain [LIME]	AUROC	0.85	—	Unverified
5	BERT [Attn]	AUROC	0.84	—	Unverified
6	BiRNN-HateXplain [Attn]	AUROC	0.81	—	Unverified
7	BiRNN-Attn [Attn]	AUROC	0.8	—	Unverified
8	CNN-GRU [LIME]	AUROC	0.79	—	Unverified
9	BiRNN [LIME]	AUROC	0.77	—	Unverified
10	XG-HSI-BERT	Accuracy	0.75	—	Unverified