SOTAVerified

Anomaly detection with a variational autoencoder for Arabic mispronunciation detection

2024-06-25International Journal of Speech Technology 2024Unverified0· sign in to hype

Meriem Lounis, Bilal Dendani, Halima Bahi

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and spoken language styles through websites, mobile applications, and social media. Herein, mispronunciation detection is a key component mainly addressed as a classification problem. Meanwhile, deep learning (DL) advances have promoted these systems by training deep neural networks (DNN) to classify a pronunciation as correct or incorrect. However, the effectiveness of the DL models is hindered by many shortcomings, such as the scarcity of labeled data. To address this issue, the paper assumes an anomaly detection-based mispronunciation detection approach. It utilizes a variational autoencoder (VAE) relying on a density-based method to model the “normal data.” The VAE is a generative model trained in a self supervised way to learn the distribution of the correct pronunciations, standing for “normal data,” and is expected to detect mispronunciations, standing for “abnormal data” during the test stage. Our proposition was evaluated in the context of Arabic pronunciation learning through the ASMDD Arabic dataset. The obtained results are promising, with an accuracy of about 98%. The proposed VAE outperformed the standard autoencoder as well as the state-of-the-art convolution neural networks used for Arabic mispronunciation detection.

Tasks

Reproductions