RoBERTa: A Robustly Optimized BERT Pretraining Approach

2019-07-26Code Available1· sign in to hype

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/xiaoqian19940510/text-classification-
pytorch★ 611
github.com/xiaoqian19940510/text-classification-surveys
pytorch★ 611
github.com/oneflow-inc/libai
none★ 406
github.com/awslabs/mlm-scoring
mxnet★ 348
github.com/hkuds/easyrec
pytorch★ 140
github.com/sdadas/polish-roberta
pytorch★ 91
github.com/octanove/shiba
pytorch★ 89
github.com/few-shot-NER-benchmark/BaselineCode
pytorch★ 57
github.com/Karthik-Bhaskar/Context-Based-Question-Answering
tf★ 44
github.com/GeorgeLuImmortal/Hierarchical-BERT-Model-with-Limited-Labelled-Data
pytorch★ 42

Abstract

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.

Tasks

Common Sense Reasoning Document Image Classification Language Modeling Language Modelling Lexical Simplification Linguistic Acceptability Multi-task Language Understanding Natural Language Inference Only Connect Walls Dataset Task 1 (Grouping)Question Answering Reading Comprehension Riddle Sense Semantic Textual Similarity Sentence Completion Sentiment Analysis Stock Market Prediction Text Classification Type prediction

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Code

Abstract

Tasks

Benchmark Results

Reproductions