Progressive Down-Sampling for Acoustic Encoding

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

In acoustic encoding, the fine-grained frame-level features are not suited for capturing global dependencies. But condensing them into a semantically complete representation by stacked down-sampling does not work well. We find that the condensation leads to the degraded correlation of the representations in adjacent positions, which poses the risk of information loss in the stacked method. In this work, we propose a new method, progressive down-sampling (PDS), for encoding the context sufficiently before each condensation. Also, we develop a representation fusion method to alleviate information loss by combining the multi-scale representations. Experimental results on the 960h LibriSpeech automatic speech recognition task show that, for a strong Conformer-based system, our method down-samples the input speech features to 1/32 of the initial length, while yielding an improvement of 0.47 WER with a speedup of 1.42. It also achieves the state-of-the-art BLEU score (25.8) on the MuST-C En-De speech translation benchmark with no additional training data.

Tasks

Automatic Speech Recognition Automatic Speech Recognition (ASR)speech-recognition Speech Recognition Translation

Progressive Down-Sampling for Acoustic Encoding

Abstract

Tasks

Reproductions