fairseq S2T: Fast Speech-to-Text Modeling with fairseq
Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/pytorch/fairseqOfficialIn paperpytorch★ 32,198
- github.com/pytorch/fairseq/tree/master/examples/speech_to_textOfficialpytorch★ 0
- github.com/huggingface/transformerspytorch★ 158,292
- github.com/pwc-1/Paper-10/tree/main/speech_to_textmindspore★ 0
- github.com/yangyucheng000/University/tree/main/model-3/speech_to_textmindspore★ 0
Abstract
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based, Transformer-based as well as Conformer-based models and open-source detailed training recipes. Fairseq's machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T documentation and examples are available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| MuST-C EN->DE | Transformer + ASR Pretrain | Case-sensitive sacreBLEU | 22.7 | — | Unverified |
| MuST-C EN->DE | Transformer + ASR Pretrain | Case-sensitive sacreBLEU | 22.8 | — | Unverified |