SOTAVerified

An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis

2014-05-01LREC 2014Unverified0· sign in to hype

Eshrag Refaee, Verena Rieser

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We present a newly collected data set of 8,868 gold-standard annotated Arabic feeds. The corpus is manually labelled for subjectivity and sentiment analysis (SSA) ( = 0:816). In addition, the corpus is annotated with a variety of motivated feature-sets that have previously shown positive impact on performance. The paper highlights issues posed by twitter as a genre, such as mixture of language varieties and topic-shifts. Our next step is to extend the current corpus, using online semi-supervised learning. A first sub-corpus will be released via the ELRA repository as part of this submission.

Tasks

Reproductions