SOTAVerified

Adaptive Testing and Debugging of NLP Models

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Current approaches to testing and debugging NLP models rely on highly variable human creativity and extensive labor, or only work for a very restrictive class of bugs. We present AdaTest, a process for adaptive testing and debugging of NLP models inspired by the test-debug cycle in traditional software engineering. AdaTest encourages a partnership between the user and a large language model (LM): the LM proposes tests that are validated and organized by the user, who in turn gives feedback and steers the LM towards better tests. Once enough bugs are discovered, these are fixed (e.g. finetuning), and the user resumes testing. In experiments with expert and non-expert users and commercial / research models for 8 different tasks, AdaTest makes users 5-10x more effective at finding bugs than current approaches, and helps users effectively fix bugs without adding new bugs.

Tasks

Reproductions