CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

2020-05-27Code Available0· sign in to hype

Keyu An, Hongyu Xiang, Zhijian Ou

Code Available — Be the first to reproduce this paper.

Code

github.com/thu-spmi/cat
OfficialIn paperpytorch★ 366

Abstract

In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of the hybrid approach and the simplicity of the E2E approach, providing a full-fledged implementation of CTC-CRFs and complete training and testing scripts for a number of English and Chinese benchmarks. Experiments show CAT obtains state-of-the-art results, which are comparable to the fine-tuned hybrid models in Kaldi but with a much simpler training pipeline. Compared to existing non-modularized E2E models, CAT performs better on limited-scale datasets, demonstrating its data efficiency. Furthermore, we propose a new method called contextualized soft forgetting, which enables CAT to do streaming ASR without accuracy degradation. We hope CAT, especially the CTC-CRF based framework and software, will be of broad interest to the community, and can be further explored and improved.

Tasks

speech-recognition Speech Recognition

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
AISHELL-1	CTC-CRF 4gram-LM	Word Error Rate (WER)	6.34	—	Unverified
Hub5'00 FISHER-SWBD	CTC-CRF	Word Error Rate (WER)	12	—	Unverified
Hub5'00 SwitchBoard	CTC-CRF	SwitchBoard	9.7	—	Unverified
WSJ dev93	CTC-CRF VGG-BLSTM	Word Error Rate (WER)	5.7	—	Unverified
WSJ eval92	CTC-CRF VGG-BLSTM	Word Error Rate (WER)	3.2	—	Unverified

CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

Code

Abstract

Tasks

Benchmark Results

Reproductions