Speech Emotion Recognition
Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.
For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP
Papers
Showing 1–10 of 431 papers
All datasetsCREMA-DIEMOCAPRAVDESSMSP-Podcast (Activation)MSP-Podcast (Dominance)MSP-Podcast (Valence)BERStRESDDusha CrowdDusha PodcastEMODBEmoDB Dataset
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Vertically long patch ViT | Accuracy | 94.07 | — | Unverified |
| 2 | ConformerXL-P | Accuracy | 88.2 | — | Unverified |
| 3 | CoordViT | Accuracy | 82.96 | — | Unverified |
| 4 | SepTr + LeRaC | Accuracy | 70.95 | — | Unverified |
| 5 | SepTr | Accuracy | 70.47 | — | Unverified |
| 6 | ResNet-18 + SPEL | Accuracy | 68.12 | — | Unverified |
| 7 | ViT | Accuracy | 67.81 | — | Unverified |
| 8 | ResNet-18 + PyNADA | Accuracy | 65.15 | — | Unverified |
| 9 | GRU | Accuracy | 55.01 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | SER with MTL | UA CV | 0.78 | — | Unverified |
| 2 | emoDARTS | UA CV | 0.77 | — | Unverified |
| 3 | LSTM+FC | WA | 0.76 | — | Unverified |
| 4 | TAP | WA CV | 0.74 | — | Unverified |
| 5 | SYSCOMB: BLSTMATT with CSA (session5) | UA | 0.74 | — | Unverified |
| 6 | Partially Fine-tuned HuBERT Large | WA CV | 0.73 | — | Unverified |
| 7 | CNN - DARTS | UA | 0.7 | — | Unverified |
| 8 | CNN+LSTM | UA | 0.65 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | VQ-MAE-S-12 (Frame) + Query2Emo | Accuracy | 84.1 | — | Unverified |
| 2 | CNN-X (Shallow CNN) | Accuracy | 82.99 | — | Unverified |
| 3 | xlsr-Wav2Vec2.0(FineTuning) | Accuracy | 81.82 | — | Unverified |
| 4 | CNN-14 (Fine-Tuning) | Accuracy | 76.58 | — | Unverified |
| 5 | AlexNet (FineTuning) | Accuracy | 61.67 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | wav2small-Teacher | CCC | 0.76 | — | Unverified |
| 2 | wavlm | CCC | 0.75 | — | Unverified |
| 3 | w2v2-L-robust-12 | CCC | 0.75 | — | Unverified |
| 4 | preCPC | CCC | 0.71 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | wav2small-Teacher | CCC | 0.68 | — | Unverified |
| 2 | wavlm | CCC | 0.67 | — | Unverified |
| 3 | w2v2-L-robust-12 | CCC | 0.66 | — | Unverified |
| 4 | preCPC | CCC | 0.64 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | wav2small-Teacher | CCC | 0.68 | — | Unverified |
| 2 | wavlm | CCC | 0.65 | — | Unverified |
| 3 | w2v2-L-robust-12 | CCC | 0.64 | — | Unverified |
| 4 | preCPC | CCC | 0.38 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | DAWN-hidden-SVM | Unweighted Accuracy (UA) | 32.1 | — | Unverified |
| 2 | Wav2Small-VAD-SVM | Unweighted Accuracy (UA) | 23.3 | — | Unverified |
| 3 | Speechbrain Wav2Vec2 | Unweighted Accuracy (UA) | 20.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | emotion2vec+base | Weighted Accuracy (WA) | 79.4 | — | Unverified |
| 2 | emotion2vec+large | Weighted Accuracy (WA) | 69.5 | — | Unverified |
| 3 | emotion2vec | Weighted Accuracy (WA) | 64.75 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Dusha baseline | Macro F1 | 0.77 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Dusha baseline | Macro F1 | 0.54 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | VGG-optiVMD | 1:1 Accuracy | 96.09 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | VQ-MAE-S-12 (Frame) + Query2Emo | Accuracy | 90.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | PyResNet | Unweighted Accuracy (UA) | 0.43 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | emoDARTS | UA | 0.66 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | LSTM | CCC (Arousal) | 0.76 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | CNN (1D) | Unweighted Accuracy | 65.2 | — | Unverified |