SOTAVerified

Reliability and Robustness of Transformers for Automated Short-Answer Grading

2021-08-17ACL ARR August 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Short-Answer Grading (SAG) is an application for NLP in education where student answers to open questions are graded. This task places high demands both on the reliability (accuracy and fairness) of label predictions and model robustness against strategic, "adversarial" input. Neural approaches are powerful tools for many problems in NLP, and transfer learning for Transformer-based models specificially promises to support data-poor tasks as this. We analyse the performance of a Transfomer-based SOTA model, zooming in on class- and item type specific behavior in order to gauge reliability; we use adversarial testing to analyze the the model's robustness towards strategic answers. We find a strong dependence on the specifics of training and test data, and recommend that model performance be verified for each individual use case.

Tasks

Reproductions