Speech Emotion Recognition
Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.
For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP
Papers
Showing 1–10 of 431 papers
All datasetsCREMA-DIEMOCAPRAVDESSMSP-Podcast (Activation)MSP-Podcast (Dominance)MSP-Podcast (Valence)BERStRESDDusha CrowdDusha PodcastEMODBEmoDB Dataset
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | VQ-MAE-S-12 (Frame) + Query2Emo | Accuracy | 84.1 | — | Unverified |
| 2 | CNN-X (Shallow CNN) | Accuracy | 82.99 | — | Unverified |
| 3 | xlsr-Wav2Vec2.0(FineTuning) | Accuracy | 81.82 | — | Unverified |
| 4 | CNN-14 (Fine-Tuning) | Accuracy | 76.58 | — | Unverified |
| 5 | AlexNet (FineTuning) | Accuracy | 61.67 | — | Unverified |