A Classification System Approach in Predicting Chinese Censorship

2025-02-06Unverified0· sign in to hype

Matt Prodani, Tianchu Ze, Yushen Hu

Unverified — Be the first to reproduce this paper.

Abstract

This paper is dedicated to using a classifier to predict whether a Weibo post would be censored under the Chinese internet. Through randomized sampling from Fu2021 and Chinese tokenizing strategies, we constructed a cleaned Chinese phrase dataset with binary censorship markings. Utilizing various probability-based information retrieval methods on the data, we were able to derive 4 logistic regression models for classification. Furthermore, we experimented with pre-trained transformers to perform similar classification tasks. After evaluating both the macro-F1 and ROC-AUC metrics, we concluded that the Fined-Tuned BERT model exceeds other strategies in performance.

Tasks

Classification Information Retrieval Retrieval

A Classification System Approach in Predicting Chinese Censorship

Abstract

Tasks

Reproductions