Self-Attention for Audio Super-Resolution

2021-08-26Code Available1· sign in to hype

Nathanaël Carraz Rakotonirina

Code Available — Be the first to reproduce this paper.

Code

github.com/ncarraz/AFILM
Officialtf★ 35

Abstract

Convolutions operate only locally, thus failing to model global interactions. Self-attention is, however, able to learn representations that capture long-range dependencies in sequences. We propose a network architecture for audio super-resolution that combines convolution and self-attention. Attention-based Feature-Wise Linear Modulation (AFiLM) uses self-attention mechanism instead of recurrent neural networks to modulate the activations of the convolutional model. Extensive experiments show that our model outperforms existing approaches on standard benchmarks. Moreover, it allows for more parallelization resulting in significantly faster training.

Tasks

Audio Super-Resolution Super-Resolution

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Piano	U-Net + AFiLM	Log-Spectral Distance	1.5	—	Unverified
VCTK Multi-Speaker	U-Net + AFiLM	Log-Spectral Distance	1.7	—	Unverified
Voice Bank corpus (VCTK)	U-Net + AFiLM	Log-Spectral Distance	2.3	—	Unverified

Self-Attention for Audio Super-Resolution

Code

Abstract

Tasks

Benchmark Results

Reproductions