Multi-stream Attention-based BLSTM with Feature Segmentation for Speech Emotion Recognition

2020-10-25Interspeech 2020Unverified0· sign in to hype

Yuya Chiba1, Takashi Nose1, Akinori Ito

Unverified — Be the first to reproduce this paper.

Abstract

This paper proposes a speech emotion recognition technique that considers the suprasegmental characteristics and temporal change of individual speech parameters. In recent years, speech emotion recognition using Bidirectional LSTM (BLSTM) has been studied actively because the model can focus on a particular temporal region that contains strong emotional characteristics. One of the model’s weaknesses is that it cannot consider the statistics of speech features, which are known to be effective for speech emotion recognition. Besides, this method cannot train individual attention parameters for different descriptors because it handles the input sequence by a single BLSTM. In this paper, we introduce feature segmentation and multi-stream processing into attention-based BLSTM to solve these problems. In addition, we employed data augmentation based on emotional speech synthesis in a training step. The classification experiments between four emotions (i.e., anger, joy, neutral, and sadness) using the Japanese Twitter-based Emotional Speech corpus (JTES) showed that the proposed method obtained a recognition accuracy of 73.4%, which is comparable to human evaluation (75.5%).

Tasks

Data Augmentation Emotional Speech Synthesis Emotion Recognition Speech Emotion Recognition Speech Synthesis

Multi-stream Attention-based BLSTM with Feature Segmentation for Speech Emotion Recognition

Abstract

Tasks

Reproductions