RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model

2021-05-24Unverified0· sign in to hype

Milan Straka, Jakub Náplava, Jana Straková, David Samuel

Unverified — Be the first to reproduce this paper.

Abstract

We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-the-art results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base.

Tasks

Semantic Parsing

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
PTG (czech, MRP 2020)	PERIN + RobeCzech	F1	92.36	—	Unverified

RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model

Abstract

Tasks

Benchmark Results

Reproductions