The Microsoft 2016 Conversational Speech Recognition System

2016-09-12Unverified0· sign in to hype

W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig

Unverified — Be the first to reproduce this paper.

Abstract

We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.2%, representing an improvement over previously reported results on this benchmark task.

Tasks

Language Modeling Language Modelling speech-recognition Speech Recognition

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
swb_hub_500 WER fullSWBCH	VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast	Percentage error	11.9	—	Unverified
Switchboard + Hub500	Microsoft 2016	Percentage error	6.2	—	Unverified
Switchboard + Hub500	VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast	Percentage error	6.3	—	Unverified
Switchboard + Hub500	RNNLM	Percentage error	6.9	—	Unverified

The Microsoft 2016 Conversational Speech Recognition System

Abstract

Tasks

Benchmark Results

Reproductions