GECToR -- Grammatical Error Correction: Tag, Not Rewrite
Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, Oleksandr Skurzhanskyi
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/grammarly/gectorOfficialIn paperpytorch★ 961
- github.com/psawa/gecko-apppytorch★ 32
- github.com/gotutiyan/gectorpytorch★ 24
Abstract
In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an F_0.5 of 65.3/66.5 on CoNLL-2014 (test) and F_0.5 of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system. The code and trained models are publicly available.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| BEA-2019 (test) | Sequence tagging + token-level transformations + two-stage fine-tuning (+RoBERTa, XLNet) | F0.5 | 73.7 | — | Unverified |
| BEA-2019 (test) | Sequence tagging + token-level transformations + two-stage fine-tuning (+XLNet) | F0.5 | 72.4 | — | Unverified |
| CoNLL-2014 Shared Task | Sequence tagging + token-level transformations + two-stage fine-tuning (+BERT, RoBERTa, XLNet) | F0.5 | 66.5 | — | Unverified |
| CoNLL-2014 Shared Task | Sequence tagging + token-level transformations + two-stage fine-tuning (+XLNet) | F0.5 | 65.3 | — | Unverified |