Improving End-to-End Bangla Speech Recognition with Semi-supervised Training

2020-11-01Findings of the Association for Computational LinguisticsUnverified0· sign in to hype

Nafis Sadeq, Nafis Tahmid Chowdhury, Farhan Tanvir Utshaw, Shafayat Ahmed, Muhammad Abdullah Adnan

Unverified — Be the first to reproduce this paper.

Abstract

Automatic speech recognition systems usually require large annotated speech corpus for training. The manual annotation of a large corpus is very difficult. It can be very helpful to use unsupervised and semi-supervised learning methods in addition to supervised learning. In this work, we focus on using a semi-supervised training approach for Bangla Speech Recognition that can exploit large unpaired audio and text data. We encode speech and text data in an intermediate domain and propose a novel loss function based on the global encoding distance between encoded data to guide the semi-supervised training. Our proposed method reduces the Word Error Rate (WER) of the system from 37\% to 31.9\%.

Tasks

Automatic Speech Recognition Automatic Speech Recognition (ASR)speech-recognition Speech Recognition

Improving End-to-End Bangla Speech Recognition with Semi-supervised Training

Abstract

Tasks

Reproductions