An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis
2014-05-01LREC 2014Unverified0· sign in to hype
Eshrag Refaee, Verena Rieser
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We present a newly collected data set of 8,868 gold-standard annotated Arabic feeds. The corpus is manually labelled for subjectivity and sentiment analysis (SSA) ( = 0:816). In addition, the corpus is annotated with a variety of motivated feature-sets that have previously shown positive impact on performance. The paper highlights issues posed by twitter as a genre, such as mixture of language varieties and topic-shifts. Our next step is to extend the current corpus, using online semi-supervised learning. A first sub-corpus will be released via the ELRA repository as part of this submission.