SOTAVerified

Self-Attention for Audio Super-Resolution

2021-08-26Code Available1· sign in to hype

Nathanaël Carraz Rakotonirina

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Convolutions operate only locally, thus failing to model global interactions. Self-attention is, however, able to learn representations that capture long-range dependencies in sequences. We propose a network architecture for audio super-resolution that combines convolution and self-attention. Attention-based Feature-Wise Linear Modulation (AFiLM) uses self-attention mechanism instead of recurrent neural networks to modulate the activations of the convolutional model. Extensive experiments show that our model outperforms existing approaches on standard benchmarks. Moreover, it allows for more parallelization resulting in significantly faster training.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
PianoU-Net + AFiLMLog-Spectral Distance1.5Unverified
VCTK Multi-SpeakerU-Net + AFiLMLog-Spectral Distance1.7Unverified
Voice Bank corpus (VCTK)U-Net + AFiLMLog-Spectral Distance2.3Unverified

Reproductions