Ensemble Methods to Distinguish Mainland and Taiwan Chinese

2019-06-01WS 2019Unverified0· sign in to hype

Hai Hu, Wen Li, He Zhou, Zuoyu Tian, Yiwen Zhang, Liang Zou

Unverified — Be the first to reproduce this paper.

Abstract

This paper describes the IUCL system at VarDial 2019 evaluation campaign for the task of discriminating between Mainland and Taiwan variation of mandarin Chinese. We first build several base classifiers, including a Naive Bayes classifier with word n-gram as features, SVMs with both character and syntactic features, and neural networks with pre-trained character/word embeddings. Then we adopt ensemble methods to combine output from base classifiers to make final predictions. Our ensemble models achieve the highest F1 score (0.893) in simplified Chinese track and the second highest (0.901) in traditional Chinese track. Our results demonstrate the effectiveness and robustness of the ensemble methods.

Tasks

Word Embeddings

Ensemble Methods to Distinguish Mainland and Taiwan Chinese

Abstract

Tasks

Reproductions