fairseq S2T: Fast Speech-to-Text Modeling with fairseq

2020-10-11Asian Chapter of the Association for Computational LinguisticsCode Available0· sign in to hype

Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino

Code Available — Be the first to reproduce this paper.

Code

github.com/pytorch/fairseq/tree/master/examples/speech_to_text
Officialpytorch★ 0
github.com/pwc-1/Paper-10/tree/main/speech_to_text
mindspore★ 0
github.com/yangyucheng000/University/tree/main/model-3/speech_to_text
mindspore★ 0

Abstract

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based, Transformer-based as well as Conformer-based models and open-source detailed training recipes. Fairseq's machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T documentation and examples are available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.

Tasks

Machine Translation Multi-Task Learning speech-recognition Speech Recognition Speech-to-Text Speech-to-Text Translation Transfer Learning Translation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
MuST-C EN->DE	Transformer + ASR Pretrain	Case-sensitive sacreBLEU	22.7	—	Unverified
MuST-C EN->DE	Transformer + ASR Pretrain	Case-sensitive sacreBLEU	22.8	—	Unverified

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

Code

Abstract

Tasks

Benchmark Results

Reproductions