GECToR -- Grammatical Error Correction: Tag, Not Rewrite

2020-05-26WS 2020Code Available1· sign in to hype

Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, Oleksandr Skurzhanskyi

Code Available — Be the first to reproduce this paper.

Code

github.com/grammarly/gector
OfficialIn paperpytorch★ 961
github.com/psawa/gecko-app
pytorch★ 32
github.com/gotutiyan/gector
pytorch★ 24

Abstract

In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an F_0.5 of 65.3/66.5 on CoNLL-2014 (test) and F_0.5 of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system. The code and trained models are publicly available.

Tasks

Grammatical Error Correction TAG

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
BEA-2019 (test)	Sequence tagging + token-level transformations + two-stage fine-tuning (+RoBERTa, XLNet)	F0.5	73.7	—	Unverified
BEA-2019 (test)	Sequence tagging + token-level transformations + two-stage fine-tuning (+XLNet)	F0.5	72.4	—	Unverified
CoNLL-2014 Shared Task	Sequence tagging + token-level transformations + two-stage fine-tuning (+BERT, RoBERTa, XLNet)	F0.5	66.5	—	Unverified
CoNLL-2014 Shared Task	Sequence tagging + token-level transformations + two-stage fine-tuning (+XLNet)	F0.5	65.3	—	Unverified

GECToR -- Grammatical Error Correction: Tag, Not Rewrite

Code

Abstract

Tasks

Benchmark Results

Reproductions