TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

2020-10-23Findings of the Association for Computational LinguisticsCode Available1· sign in to hype

Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves, Luis Espinosa-Anke

Code Available — Be the first to reproduce this paper.

Code

github.com/cardiffnlp/tweeteval
OfficialIn papernone★ 395
github.com/jinhxu/how-much-hate-with-china
none★ 5

Abstract

The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

Tasks

Classification General Classification Language Modeling Language Modelling Sentiment Analysis

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
TweetEval	RoBERTa-Base	Emoji	30.9	—	Unverified
TweetEval	RoBERTa-Twitter	Emoji	29.3	—	Unverified
TweetEval	SVM	Emoji	29.3	—	Unverified
TweetEval	FastText	Emoji	25.8	—	Unverified
TweetEval	LSTM	Emoji	24.7	—	Unverified

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Code

Abstract

Tasks

Benchmark Results

Reproductions