SOTAVerified

Word Error Rate Estimation for Speech Recognition: e-WER

2018-07-01ACL 2018Code Available1· sign in to hype

Ahmed Ali, Steve Renals

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we propose a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set. Our e-WER framework uses a comprehensive set of features: ASR recognised text, character recognition results to complement recognition output, and internal decoder features. We report results for the two features; black-box and glass-box using unseen 24 Arabic broadcast programs. Our system achieves 16.9\% WER root mean squared error (RMSE) across 1,400 sentences. The estimated overall WER e-WER was 25.3\% for the three hours test set, while the actual WER was 28.5\%.

Tasks

Reproductions