Fine-tuning wav2vec2 for speaker recognition

2021-09-30Code Available1· sign in to hype

Nik Vaessen, David A. van Leeuwen

Code Available — Be the first to reproduce this paper.

Code

github.com/nikvaessen/w2v2-speaker
OfficialIn paperpytorch★ 146
github.com/MS-P3/code7/tree/main/wav2vec2
mindspore★ 0
github.com/pwc-1/Paper-9/tree/main/1/wav2vec2_with_lm
mindspore★ 0
github.com/MindCode-4/code-5/tree/main/wav2vec2
mindspore★ 0

Abstract

This paper explores applying the wav2vec2 framework to speaker recognition instead of speech recognition. We study the effectiveness of the pre-trained weights on the speaker recognition task, and how to pool the wav2vec2 output sequence into a fixed-length speaker embedding. To adapt the framework to speaker recognition, we propose a single-utterance classification variant with CE or AAM softmax loss, and an utterance-pair classification variant with BCE loss. Our best performing variant, w2v2-aam, achieves a 1.88% EER on the extended voxceleb1 test set compared to 1.69% EER with an ECAPA-TDNN baseline. Code is available at https://github.com/nikvaessen/w2v2-speaker.

Tasks

Classification Speaker Recognition speech-recognition

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
VoxCeleb1	w2v2-aam	EER	1.88	—	Unverified

Fine-tuning wav2vec2 for speaker recognition

Code

Abstract

Tasks

Benchmark Results

Reproductions