Centroid-based deep metric learning for speaker recognition

2019-02-06Unverified0· sign in to hype

Jixuan Wang, Kuan-Chieh Wang, Marc Law, Frank Rudzicz, Michael Brudno

Unverified — Be the first to reproduce this paper.

Abstract

Speaker embedding models that utilize neural networks to map utterances to a space where distances reflect similarity between speakers have driven recent progress in the speaker recognition task. However, there is still a significant performance gap between recognizing speakers in the training set and unseen speakers. The latter case corresponds to the few-shot learning task, where a trained model is evaluated on unseen classes. Here, we optimize a speaker embedding model with prototypical network loss (PNL), a state-of-the-art approach for the few-shot image classification task. The resulting embedding model outperforms the state-of-the-art triplet loss based models in both speaker verification and identification tasks, for both seen and unseen speakers.

Tasks

Few-Shot Image Classification Few-Shot Learning General Classification image-classification Image Classification Metric Learning Speaker Recognition Speaker Verification Triplet

Centroid-based deep metric learning for speaker recognition

Abstract

Tasks

Reproductions