BERTweet: A pre-trained language model for English Tweets

2020-05-20EMNLP 2020Code Available1· sign in to hype

Dat Quoc Nguyen, Thanh Vu, Anh Tuan Nguyen

Code Available — Be the first to reproduce this paper.

Code

github.com/VinAIResearch/BERTweet
OfficialIn paperpytorch★ 606
github.com/cardiffnlp/tweeteval
none★ 395
github.com/2024-MindSpore-1/Code2/tree/main/model-1/bertweet
mindspore★ 0

Abstract

We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. We release BERTweet under the MIT License to facilitate future research and applications on Tweet data. Our BERTweet is available at https://github.com/VinAIResearch/BERTweet

Tasks

Language Modeling Language Modelling named-entity-recognition Named Entity Recognition Named Entity Recognition (NER)Part-Of-Speech Tagging Sentiment Analysis text-classification Text Classification XLM-R

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
WNUT 2016	BERTweet	F1	52.1	—	Unverified
WNUT 2017	BERTweet	F1	56.5	—	Unverified

BERTweet: A pre-trained language model for English Tweets

Code

Abstract

Tasks

Benchmark Results

Reproductions