A Deep Paradigm for Articulatory Speech Representation Learning via Neural Convolutive Sparse Matrix Factorization

2022-01-16ACL ARR January 2022Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

Most of the research on data-driven speech representation learning has focused on raw audios in an end-to-end manner, paying little attention to their internal phonological or gestural structure. This work, investigating the speech representations derived from articulatory kinematics signals, uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores. By applying sparse constraints, the gestural scores leverage the discrete combinatorial properties of phonological gestures. Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully. The proposed work thus makes a bridge between articulatory phonology and deep neural networks to leverage interpretable, intelligible, informative, and efficient speech representations.

Tasks

Phoneme Recognition Representation Learning Speech Representation Learning

A Deep Paradigm for Articulatory Speech Representation Learning via Neural Convolutive Sparse Matrix Factorization

Abstract

Tasks

Reproductions