SOTAVerified

DUTH at SemEval-2019 Task 8: Part-Of-Speech Features for Question Classification

2019-06-01SEMEVAL 2019Unverified0· sign in to hype

Anastasios Bairaktaris, Symeon Symeonidis, Avi Arampatzis

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This report describes the methods employed by the Democritus University of Thrace (DUTH) team for participating in SemEval-2019 Task 8: Fact Checking in Community Question Answering Forums. Our team dealt only with Subtask A: Question Classification. Our approach was based on shallow natural language processing (NLP) pre-processing techniques to reduce the noise in data, feature selection methods, and supervised machine learning algorithms such as NearestCentroid, Perceptron, and LinearSVC. To determine the essential features, we were aided by exploratory data analysis and visualizations. In order to improve classification accuracy, we developed a customized list of stopwords, retaining some opinion- and fact-denoting common function words which would have been removed by standard stoplisting. Furthermore, we examined the usefulness of part-of-speech (POS) categories for the task; by trying to remove nouns and adjectives, we found some evidence that verbs are a valuable POS category for the opinion question class.

Tasks

Reproductions