End-to-end neural networks for subvocal speech recognition

2017-06-11CS 224S 2017Unverified0· sign in to hype

Pol Rosello, Pamela Toman, Nipun Agarwala

Unverified — Be the first to reproduce this paper.

Abstract

Subvocalization is a phenomenon observed while subjects read or think, characterized by involuntary facial and laryngeal muscle movements. By measuring this muscle activity using surface electromyography (EMG), it may be possible to perform automatic speech recognition (ASR) and enable silent, handsfree human-computer interfaces. In our work, we describe the first approach toward end-to-end, session-independent subvocal speech recognition by leveraging character-level recurrent neural networks (RNNs) and the connectionist temporal classification loss (CTC). We attempt to address challenges posed by a lack of data, including poor generalization, through data augmentation of electromyographic signals, a specialized multi-modal architecture, and regularization. We show results indicating reasonable qualitative performance on test set utterances, and describe promising avenues for future work in this direction.

Tasks

Automatic Speech Recognition Automatic Speech Recognition (ASR)Data Augmentation Electromyography (EMG)speech-recognition Speech Recognition

End-to-end neural networks for subvocal speech recognition

Abstract

Tasks

Reproductions