JCT at SemEval-2022 Task 4-A: Patronism Detection in Posts Written in English using Preprocessing Methods and various Machine Leaerning Methods
2022-07-01SemEval (NAACL) 2022Unverified0· sign in to hype
Yaakov HaCohen-Kerner, Ilan Meyrowitsch, Matan Fchima
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In this paper, we describe our submissions to SemEval-2022 subtask 4-A - “Patronizing and Condescending Language Detection: Binary Classification”. We developed different models for this subtask. We applied 11 supervised machine learning methods and 9 preprocessing methods. Our best submission was a model we built with BertForSequenceClassification. Our experiments indicate that pre-processing stage is a must for a successful model. The dataset for Subtask 1 is highly imbalanced dataset. The f1-scores on the oversampled imbalanced training dataset were higher the results on the original training dataset.