wav2vec: Unsupervised Pre-training for Speech Recognition

2019-04-11Code Available1· sign in to hype

Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli

Code Available — Be the first to reproduce this paper.

Code

github.com/mailong25/vietnamese-speech-recognition
pytorch★ 379
github.com/eastonYi/wav2vec
pytorch★ 170
github.com/shangeth/wavencoder
pytorch★ 92
github.com/pytorch/fairseq/blob/master/fairseq/models/wav2vec.py
pytorch★ 0
github.com/MS-P3/code7/tree/main/wav2vec2_with_lm
mindspore★ 0
github.com/pwc-1/Paper-9/tree/main/1/wav2vec2
mindspore★ 0

Abstract

We explore unsupervised pre-training for speech recognition by learning representations of raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model training. We pre-train a simple multi-layer convolutional neural network optimized via a noise contrastive binary classification task. Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available. Our approach achieves 2.43% WER on the nov92 test set. This outperforms Deep Speech 2, the best reported character-based system in the literature while using two orders of magnitude less labeled training data.

Tasks

Binary Classification General Classification Speech Recognition Unsupervised Pre-training

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
TIMIT	wav2vec	Percentage error	14.7	—	Unverified

wav2vec: Unsupervised Pre-training for Speech Recognition

Code

Abstract

Tasks

Benchmark Results

Reproductions