Speech Enhancement
Speech Enhancement is a signal processing task that involves improving the quality of speech signals captured under noisy or degraded conditions. The goal of speech enhancement is to make speech signals clearer, more intelligible, and more pleasant to listen to, which can be used for various applications such as voice recognition, teleconferencing, and hearing aids. A representative Github project with online demo : ClearerVoice-Studio.
( Image credit: A Fully Convolutional Neural Network For Speech Enhancement )
Papers
Showing 1–10 of 982 papers
All datasetsVoiceBank + DEMANDDeep Noise Suppression (DNS) ChallengeCHiME-3EARS-WHAMEasyComDNS ChallengeVB-DemandExWHAMR!WSJ0 + DEMAND + RNNoiseRealMANVoiceBank+DEMANDDEMAND
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ROSE-CD(PESQ) | PESQ (wb) | 3.99 | — | Unverified |
| 2 | PESQetarian | PESQ (wb) | 3.82 | — | Unverified |
| 3 | Mamba-SEUNet L (+PCS) | PESQ (wb) | 3.73 | — | Unverified |
| 4 | Schrödinger bridge (PESQ loss) | PESQ (wb) | 3.7 | — | Unverified |
| 5 | SEMamba (+PCS) | PESQ (wb) | 3.69 | — | Unverified |
| 6 | ZipEnhancer (S, \lamba_6 = 0) | PESQ (wb) | 3.63 | — | Unverified |
| 7 | PrimeK-Net | PESQ (wb) | 3.61 | — | Unverified |
| 8 | ZipEnhancer (S, \lamba_6 = 0.2) | PESQ (wb) | 3.61 | — | Unverified |
| 9 | MP-SENet | PESQ (wb) | 3.6 | — | Unverified |
| 10 | PCS_CS_WAVLM | PESQ (wb) | 3.54 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | BSRNN-S + MGD | SI-SDR-WB | 21.4 | — | Unverified |
| 2 | DTLN | SI-SDR-WB | 16.34 | — | Unverified |
| 3 | Non-Real-Time MultiScale+ | SI-SDR-WB | 16.22 | — | Unverified |
| 4 | ZipEnhancer (M) | PESQ-WB | 3.81 | — | Unverified |
| 5 | TF-Locoformer (M) | PESQ-WB | 3.72 | — | Unverified |
| 6 | ZipEnhancer (S) | PESQ-WB | 3.69 | — | Unverified |
| 7 | MambAttention | PESQ-WB | 3.67 | — | Unverified |
| 8 | MP-SENet | PESQ-WB | 3.62 | — | Unverified |
| 9 | xLSTM-SENet | PESQ-WB | 3.59 | — | Unverified |
| 10 | BSRNN-S + MRSD | PESQ-WB | 3.53 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Inter-Channel Conv-TasNet | SDR | 19.67 | — | Unverified |
| 2 | CA Dense U-Net (Complex) | SDR | 18.64 | — | Unverified |
| 3 | Dense U-Net (Complex) | SDR | 18.4 | — | Unverified |
| 4 | Dense U-Net (Real) | SDR | 16.86 | — | Unverified |
| 5 | U-Net (Real) | SDR | 15.97 | — | Unverified |
| 6 | Noisy/unprocessed | SDR | 6.5 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Schrödinger Bridge (PESQ loss) | PESQ-WB | 3.09 | — | Unverified |
| 2 | SGMSE+ | PESQ-WB | 2.5 | — | Unverified |
| 3 | Demucs v4 | PESQ-WB | 2.37 | — | Unverified |
| 4 | Schrödinger Bridge | PESQ-WB | 2.33 | — | Unverified |
| 5 | Conv-TasNet | PESQ-WB | 2.31 | — | Unverified |
| 6 | CDiffuSE | PESQ-WB | 1.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ReVISE (ch2) | Audio Quality MOS | 4.19 | — | Unverified |
| 2 | ReVISE (bf) | Audio Quality MOS | 4.11 | — | Unverified |
| 3 | Demucs (ch2) | Audio Quality MOS | 2.95 | — | Unverified |
| 4 | Demucs (bf) | Audio Quality MOS | 2.39 | — | Unverified |
| 5 | MaxDI (Baseline) | PESQ | 1.17 | — | Unverified |
| 6 | DAJA (MVDR,HMA,1000) (Overlapped Speech) | SDR | -4.76 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ZipEnhancer (M) | PESQ-NB | 4.08 | — | Unverified |
| 2 | DCCRN-MC | PESQ-NB | 3.21 | — | Unverified |
| 3 | DCCRN-M | PESQ-NB | 3.15 | — | Unverified |
| 4 | DCCRN | PESQ-NB | 3.04 | — | Unverified |
| 5 | RNN-Modulation | PESQ-WB | 2.75 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MambAttention | ESTOI | 0.8 | — | Unverified |
| 2 | SEMamba | ESTOI | 0.8 | — | Unverified |
| 3 | xLSTM-SENet | ESTOI | 0.8 | — | Unverified |
| 4 | MP-SENet | ESTOI | 0.79 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SepFormer | PESQ | 2.84 | — | Unverified |
| 2 | DTLN | PESQ | 2.23 | — | Unverified |
| 3 | Unprocessed | PESQ | 1.83 | — | Unverified |
| 4 | Non-Real-Time MultiScale+ | PESQ | 1.52 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | CleanMel-L-map | DNSMOS | 3.82 | — | Unverified |
| 2 | SpatialNet | DNSMOS BAK | 3.43 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | rose_cd(PESQ ) | PESQ | 3.99 | — | Unverified |
| 2 | ROSE-CD | PESQ | 3.49 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Wave-U-Net | CBAK | 3.24 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Audio-Visual concat-ref | PESQ | 2.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SE-MelGAN | Audio Quality MOS | 3.1 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | DeFT-AN | PESQ | 3.01 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Audio-Visual concat-ref | PESQ | 3.03 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SepFormer | PESQ | 3.07 | — | Unverified |