Hate Speech Detection

Hate speech detection is the task of detecting if communication such as text, audio, and so on contains hatred and or encourages violence towards a person or a group of people. This is usually based on prejudice against 'protected characteristics' such as their ethnicity, gender, sexual orientation, religion, age et al. Some example benchmarks are ETHOS and HateXplain. Models can be evaluated with metrics like the F-score or F-measure.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–375 of 507 papers

Title	Date	Tasks	Status
Multimodal and Explainable Internet Meme Classification	Dec 11, 2022	ClassificationExplainable Models	—Unverified
When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks	May 11, 2023	Hate Speech Detection	—Unverified
Multi-modal Hate Speech Detection using Machine Learning	Jun 15, 2023	Hate Speech Detection	—Unverified
Multitask Learning for Arabic Offensive Language and Hate-Speech Detection	May 1, 2020	Hate Speech DetectionTransfer Learning	—Unverified
A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection	Jun 1, 2018	General ClassificationHate Speech Detection	—Unverified
My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks	Jun 24, 2023	BenchmarkingHate Speech Detection	—Unverified
A Comprehensive Study on NLP Data Augmentation for Hate Speech Detection: Legacy Methods, BERT, and LLMs	Mar 30, 2024	Data AugmentationHate Speech Detection	—Unverified
Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection	Dec 14, 2024	Hate Speech Detection	—Unverified
ToxSyn-PT: A Large-Scale Synthetic Dataset for Hate Speech Detection in Portuguese	Jun 11, 2025	Hate Speech DetectionMulti-Label Classification	—Unverified
Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection	Oct 17, 2022	Hate Speech Detection	—Unverified
Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling	Aug 1, 2018	Data AugmentationFeature Engineering	—Unverified
Whose Emotions and Moral Sentiments Do Language Models Reflect?	Feb 16, 2024	Hate Speech Detection	—Unverified
NoisyHate: Mining Online Human-Written Perturbations for Realistic Robustness Benchmarking of Content Moderation Models	Mar 18, 2023	Adversarial AttackBenchmarking	—Unverified
Trustworthy Hate Speech Detection Through Visual Augmentation	Sep 20, 2024	Hate Speech Detection	—Unverified
TuEval at SemEval-2019 Task 5: LSTM Approach to Hate Speech Detection in English and Spanish	Jun 1, 2019	Hate Speech Detection	—Unverified
Offensive Language and Hate Speech Detection for Danish	Aug 13, 2019	Hate Speech Detection	—Unverified
Offensive Language and Hate Speech Detection with Deep Learning and Transfer Learning	Aug 6, 2021	Data AugmentationHate Speech Detection	—Unverified
On a Benefit of Mask Language Modeling: Robustness to Simplicity Bias	Oct 11, 2021	Hate Speech DetectionLanguage Modeling	—Unverified
One to rule them all: Towards Joint Indic Language Hate Speech Detection	Sep 28, 2021	AllHate Speech Detection	—Unverified
On Fairness of Task Arithmetic: The Role of Task Vectors	May 30, 2025	FairnessHate Speech Detection	—Unverified
On Importance of Code-Mixed Embeddings for Hate Speech Identification	Nov 27, 2024	Hate Speech DetectionSentence	—Unverified
On Limitations of LLM as Annotator for Low Resource Languages	Nov 26, 2024	Hate Speech DetectionNews Classification	—Unverified
Online Hate: Behavioural Dynamics and Relationship with Misinformation	May 28, 2021	Hate Speech DetectionMisinformation	—Unverified
On the Challenges of Building Datasets for Hate Speech Detection	Sep 6, 2023	Hate Speech Detection	—Unverified
A Legal Approach to Hate Speech -- Operationalizing the EU's Legal Framework against the Expression of Hatred as an NLP Task	Apr 7, 2020	Decision MakingHate Speech Detection	—Unverified

Show:10 25 50

← PrevPage 15 of 21Next →

All datasets Ethos Binary HateXplain Ethos MultiLabel Waseem et al., 2018 AbusEval Automatic Misogynistic Identification HateMM HatEval OffensEval 2019 ToLD-Br bajer_danish_misogyny DKhate

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	BiLSTM + static BE	F1-score	0.8	—	Unverified
2	BERT	F1-score	0.79	—	Unverified
3	BiLSTM+Attention+FT	F1-score	0.77	—	Unverified
4	OPT-175B (few-shot)	F1-score	0.76	—	Unverified
5	CNN+Attention+FT+GV	F1-score	0.74	—	Unverified
6	OPT-175B (one-shot)	F1-score	0.71	—	Unverified
7	OPT-175B (zero-shot)	F1-score	0.67	—	Unverified
8	SVM	F1-score	0.66	—	Unverified
9	Random Forests	F1-score	0.64	—	Unverified
10	Davinci (zero-shot)	F1-score	0.63	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BERT-MRP	AUROC	0.86	—	Unverified
2	BERT-RP	AUROC	0.85	—	Unverified
3	BERT-HateXplain [LIME]	AUROC	0.85	—	Unverified
4	BERT-HateXplain [Attn]	AUROC	0.85	—	Unverified
5	BERT [Attn]	AUROC	0.84	—	Unverified
6	BiRNN-HateXplain [Attn]	AUROC	0.81	—	Unverified
7	BiRNN-Attn [Attn]	AUROC	0.8	—	Unverified
8	CNN-GRU [LIME]	AUROC	0.79	—	Unverified
9	BiRNN [LIME]	AUROC	0.77	—	Unverified
10	XG-HSI-BERT	Accuracy	0.75	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MLARAM	Hamming Loss	0.29	—	Unverified
2	MLkNN	Hamming Loss	0.16	—	Unverified
3	Binary Relevance	Hamming Loss	0.14	—	Unverified
4	Neural Classifier Chains	Hamming Loss	0.13	—	Unverified
5	Neural Binary Relevance	Hamming Loss	0.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mozafari et al., 2019	AAA	50.94	—	Unverified
2	SVM	AAA	46.51	—	Unverified
3	Kennedy et al., 2020	AAA	45.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HateBERT	Macro F1	0.74	—	Unverified
2	BERT	Macro F1	0.72	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	mBert	Accuracy	0.83	—	Unverified
2	Logistic Regression	Accuracy	0.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HXP + CLAP + CLIP	TEST F1 (macro)	0.85	—	Unverified
2	BERT + ViT + MFCC	TEST F1 (macro)	0.79	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HateBERT	Macro F1	0.49	—	Unverified
2	BERT	Macro F1	0.48	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HateBERT	Macro F1	0.81	—	Unverified
2	BERT	Macro F1	0.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Multilingual BERT	F1-score	0.75	—	Unverified
2	AutoML	F1-score	0.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	AOM mBERT	F1	0.85	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline	F1	0.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	RoBERTa-large-ST	Macro F1	80.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline BERT (task A)	F1	0.77	—	Unverified