Conditioning Sequence-to-sequence Networks with Learned Activations

2021-09-29ICLR 2022Unverified0· sign in to hype

Alberto Gil Couto Pimentel Ramos, Abhinav Mehrotra, Nicholas Donald Lane, Sourav Bhattacharya

Unverified — Be the first to reproduce this paper.

Abstract

Conditional neural networks play an important role in a number of sequence-to-sequence modeling tasks, including personalized sound enhancement (PSE), speaker dependent automatic speech recognition (ASR), and generative modeling such as text-to-speech synthesis. In conditional neural networks, the output of a model is often influenced by a conditioning vector, in addition to the input. Common approaches of conditioning include input concatenation or modulation with the conditioning vector, which comes at the cost of increased model size.In this work, we introduce a novel approach of neural network conditioning by learning intermediate layer activations based on the conditioning vector. We systematically explore and show that learned activations can produce conditional models with comparable or better quality, while having significantly lower sizes, thus making them ideal candidates for resource-efficient on-device deployment. As exemplary target use-cases we consider (i) the task of PSE as a pre-processing technique for improving telephony or pre-trained ASR performance under babble or ambient noise, and (ii) personalized ASR in single speaker scenarios. We find that conditioning via activation learning is an effective modeling strategy, suggesting a broad applicability of the proposed technique across a number of application domains.

Tasks

Automatic Speech Recognition Automatic Speech Recognition (ASR)speech-recognition Speech Recognition Speech Synthesis text-to-speech Text to Speech Text-To-Speech Synthesis

Conditioning Sequence-to-sequence Networks with Learned Activations

Abstract

Tasks

Reproductions