WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction

2024-09-24Code Available3· sign in to hype

Shuai Wang, Ke Zhang, Shaoxiong Lin, Junjie Li, Xuefei Wang, Meng Ge, Jianwei Yu, Yanmin Qian, Haizhou Li

Code Available — Be the first to reproduce this paper.

Code

github.com/wenet-e2e/wesep
OfficialIn paperpytorch★ 250

Abstract

Target speaker extraction (TSE) focuses on isolating the speech of a specific target speaker from overlapped multi-talker speech, which is a typical setup in the cocktail party problem. In recent years, TSE draws increasing attention due to its potential for various applications such as user-customized interfaces and hearing aids, or as a crutial front-end processing technologies for subsequential tasks such as speech recognition and speaker recongtion. However, there are currently few open-source toolkits or available pre-trained models for off-the-shelf usage. In this work, we introduce WeSep, a toolkit designed for research and practical applications in TSE. WeSep is featured with flexible target speaker modeling, scalable data management, effective on-the-fly data simulation, structured recipes and deployment support. The toolkit is publicly avaliable at https://github.com/wenet-e2e/WeSep.

Tasks

Management speech-recognition Speech Recognition Target Speaker Extraction

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction

Code

Abstract

Tasks

Reproductions