Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

2024-05-17Code Available4· sign in to hype

Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

Code Available — Be the first to reproduce this paper.

Code

github.com/fgnt/nara_wpe
Officialtf★ 559

Abstract

This paper summarizes our acoustic modeling efforts in the Johns Hopkins University speech recognition system for the CHiME-5 challenge to recognize highly-overlapped dinner party speech recorded by multiple microphone arrays. We explore data augmentation approaches, neural network architectures, front-end speech dereverberation, beamforming and robust i-vector extraction with comparisons of our in-house implementations and publicly available tools. We finally achieved a word error rate of 69.4% on the development set, which is a 11.7% absolute improvement over the previous baseline of 81.1%, and release this improved baseline with refined techniques/tools as an advanced CHiME-5 recipe.

Tasks

Data Augmentation Speech Dereverberation speech-recognition Speech Recognition

Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

Code

Abstract

Tasks

Reproductions