SOTAVerified

The Tembusu Treebank: An English Learner Treebank

2022-06-01LREC 2022Code Available0· sign in to hype

Luís Morgado da Costa, Francis Bond, Roger V. P. Winder

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper reports on the creation and development of the Tembusu Learner Treebank — an open treebank created from the NTU Corpus of Learner English, unique for incorporating mal-rules in the annotation of ungrammatical sentences. It describes the motivation and development of the treebank, as well as its exploitation to build a new parse-ranking model for the English Resource Grammar, designed to help improve the parse selection of ungrammatical sentences and diagnose these sentences through mal-rules. The corpus contains 25,000 sentences, of which 4,900 are treebanked. The paper concludes with an evaluation experiment that shows the usefulness of this new treebank in the tasks of grammatical error detection and diagnosis.

Tasks

Reproductions