FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

2022-10-22Code Available1· sign in to hype

Lvxiaowei Xu, Jianwang Wu, Jiawei Peng, Jiayu Fu, Ming Cai

Code Available — Be the first to reproduce this paper.

Code

github.com/xlxwalex/FCGEC
OfficialIn paperpytorch★ 120
github.com/xlxwalex/hycxg
none★ 20

Abstract

Grammatical Error Correction (GEC) has been broadly applied in automatic correction and proofreading system recently. However, it is still immature in Chinese GEC due to limited high-quality data from native speakers in terms of category and scale. In this paper, we present FCGEC, a fine-grained corpus to detect, identify and correct the grammatical errors. FCGEC is a human-annotated corpus with multiple references, consisting of 41,340 sentences collected mainly from multi-choice questions in public school Chinese examinations. Furthermore, we propose a Switch-Tagger-Generator (STG) baseline model to correct the grammatical errors in low-resource settings. Compared to other GEC benchmark models, experimental results illustrate that STG outperforms them on our FCGEC. However, there exists a significant gap between benchmark models and humans that encourages future models to bridge it.

Tasks

Grammatical Error Correction Grammatical Error Detection

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
FCGEC	STG-Joint	exact match	34.1	—	Unverified

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

Code

Abstract

Tasks

Benchmark Results

Reproductions