Neural Network Language Modeling with Letter-based Features and Importance Sampling

2018-04-15ICASSP 2018Unverified0· sign in to hype

Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey, Sanjeev Khudanpur

Unverified — Be the first to reproduce this paper.

Abstract

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks. We combine the use of subword features (letter n-grams) and one-hot encoding of frequent words so that the models can handle large vocabularies containing infrequent words. We propose a new objective function that allows for training of unnormalized probabilities. An importance sampling based method is supported to speed up training when the vocabulary is large. Experimental results on five corpora show that Kaldi-RNNLM rivals other recurrent neural network language model toolkits both on performance and training speed.

Tasks

Automatic Speech Recognition Automatic Speech Recognition (ASR)Language Modeling Language Modelling speech-recognition Speech Recognition

Neural Network Language Modeling with Letter-based Features and Importance Sampling

Abstract

Tasks

Reproductions