All You Need is "Leet": Evading Hate-speech Detection AI
2025-05-22Code Available0· sign in to hype
Sampanna Yashwant Kahu, Naman Ahuja
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/sampannakahu/all_you_need_is_leetOfficialIn papernone★ 0
Abstract
Social media and online forums are increasingly becoming popular. Unfortunately, these platforms are being used for spreading hate speech. In this paper, we design black-box techniques to protect users from hate-speech on online platforms by generating perturbations that can fool state of the art deep learning based hate speech detection models thereby decreasing their efficiency. We also ensure a minimal change in the original meaning of hate-speech. Our best perturbation attack is successfully able to evade hate-speech detection for 86.8 % of hateful text.