Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement

2025-05-07Unverified0· sign in to hype

Rauf Nasretdinov, Roman Korostik, Ante Jukić

Unverified — Be the first to reproduce this paper.

Abstract

In this work, we investigate application of generative speech enhancement to improve the robustness of ASR models in noisy and reverberant conditions. We employ a recently-proposed speech enhancement model based on Schr\"odinger bridge, which has been shown to perform well compared to diffusion-based approaches. We analyze the impact of model scaling and different sampling methods on the ASR performance. Furthermore, we compare the considered model with predictive and diffusion-based baselines and analyze the speech recognition performance when using different pre-trained ASR models. The proposed approach significantly reduces the word error rate, reducing it by approximately 40% relative to the unprocessed speech signals and by approximately 8% relative to a similarly sized predictive approach.

Tasks

Robust Speech Recognition Speech Enhancement speech-recognition Speech Recognition

Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement

Abstract

Tasks

Reproductions