SOTAVerified

Deep Speaker Feature Learning for Text-independent Speaker Verification

2017-05-10Unverified0· sign in to hype

Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.

Tasks

Reproductions