Should You Fine-Tune BERT for Automated Essay Scoring?

2020-07-01WS 2020Unverified0· sign in to hype

Elijah Mayfield, Alan W. black

Unverified — Be the first to reproduce this paper.

Abstract

Most natural language processing research now recommends large Transformer-based models with fine-tuning for supervised classification tasks; older strategies like bag-of-words features and linear models have fallen out of favor. Here we investigate whether, in automated essay scoring (AES) research, deep neural models are an appropriate technological choice. We find that fine-tuning BERT produces similar performance to classical models at significant additional cost. We argue that while state-of-the-art strategies do match existing best results, they come with opportunity costs in computational resources. We conclude with a review of promising areas for research on student essays where the unique characteristics of Transformers may provide benefits over classical methods to justify the costs.

Tasks

Automated Essay Scoring

Should You Fine-Tune BERT for Automated Essay Scoring?

Abstract

Tasks

Reproductions