Compress Polyphone Pronunciation Prediction Model with Shared Labels

2020-10-01CCL 2020Unverified0· sign in to hype

Pengfei Chen, Lina Wang, Hui Di, Kazushige Ouchi, Lvhong Wang

Unverified — Be the first to reproduce this paper.

Abstract

It is well known that deep learning model has huge parameters and is computationally expensive, especially for embedded and mobile devices. Polyphone pronunciations selection is a basic function for Chinese Text-to-Speech (TTS) application. Recurrent neural network (RNN) is a good sequence labeling solution for polyphone pronunciation selection. However, huge parameters and computation make compression needed to alleviate its disadvantage. In contrast to existing quantization with low precision data format and projection layer, we propose a novel method based on shared labels, which focuses on compressing the fully-connected layer before Softmax for models with a huge number of labels in TTS polyphone selection. The basic idea is to compress large number of target labels into a few label clusters, which will share the parameters of fully-connected layer. Furthermore, we combine it with other methods to further compress the polyphone pronunciation selection model. The experimental result shows that for Bi-LSTM (Bidirectional Long Short Term Memory) based polyphone selection, shared labels model decreases about 52% of original model size and accelerates prediction by 44% almost without performance loss. It is worth mentioning that the proposed method can be applied for other tasks to compress the model and accelerate the calculation.

Tasks

Prediction Quantization text-to-speech Text to Speech

Compress Polyphone Pronunciation Prediction Model with Shared Labels

Abstract

Tasks

Reproductions