Exploring Classifier Combinations for Language Variety Identification

2018-08-01COLING 2018Unverified0· sign in to hype

Tim Kreutz, Walter Daelemans

Unverified — Be the first to reproduce this paper.

Abstract

This paper describes CLiPS's submissions for the Discriminating between Dutch and Flemish in Subtitles (DFS) shared task at VarDial 2018. We explore different ways to combine classifiers trained on different feature groups. Our best system uses two Linear SVM classifiers; one trained on lexical features (word n-grams) and one trained on syntactic features (PoS n-grams). The final prediction for a document to be in Flemish Dutch or Netherlandic Dutch is made by the classifier that outputs the highest probability for one of the two labels. This confidence vote approach outperforms a meta-classifier on the development data and on the test data.

Tasks

Language Identification POS

Exploring Classifier Combinations for Language Variety Identification

Abstract

Tasks

Reproductions