SOTAVerified

Speech Emotion Recognition

Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.

For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP

Papers

Showing 301350 of 431 papers

TitleStatusHype
Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning0
Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning0
Meta Transfer Learning for Emotion Recognition0
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction0
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition0
MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention0
Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach0
Mixer is more than just a model0
Modulation spectral features for speech emotion recognition using deep neural networks0
Mouth Articulation-Based Anchoring for Improved Cross-Corpus Speech Emotion Recognition0
MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition0
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling0
MSP-Podcast SER Challenge 2024: L'antenne du Ventoux Multimodal Self-Supervised Learning for Speech Emotion Recognition0
Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition0
Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search0
Multi-Microphone Speech Emotion Recognition using the Hierarchical Token-semantic Audio Transformer Architecture0
Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning0
Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text0
Multiscale Contextual Learning for Speech Emotion Recognition in Emergency Call Center Conversations0
Multi-Scale Temporal Transformer For Speech Emotion Recognition0
Multistage linguistic conditioning of convolutional layers for speech emotion recognition0
Multi-stream Attention-based BLSTM with Feature Segmentation for Speech Emotion Recognition0
Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers0
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks0
Multi-Window Data Augmentation Approach for Speech Emotion Recognition0
Neural Architecture Search for Speech Emotion Recognition0
Noise robust speech emotion recognition with signal-to-noise ratio adapting speech enhancement0
Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech0
Non-linear frequency warping using constant-Q transformation for speech emotion recognition0
Normalization Before Shaking Toward Learning Symmetrically Distributed Representation Without Margin in Speech Emotion Recognition0
Novel Dual-Channel Long Short-Term Memory Compressed Capsule Networks for Emotion Recognition0
Once More, With Feeling: Measuring Emotion of Acting Performances in Contemporary American Film0
On Enhancing Speech Emotion Recognition using Generative Adversarial Networks0
On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition0
On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition0
On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era0
On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks0
On the use of Self-supervised Pre-trained Acoustic and Linguistic Features for Continuous Speech Emotion Recognition0
Optimizing Speech Emotion Recognition using Manta-Ray Based Feature Selection0
Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation0
Representation Learning with Parameterised Quantum Circuits for Advancing Speech Emotion Recognition0
PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition0
PCQ: Emotion Recognition in Speech via Progressive Channel Querying0
Persian Speech Emotion Recognition by Fine-Tuning Transformers0
Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition0
Personalized Speech Emotion Recognition in Human-Robot Interaction using Vision Transformers0
Pitch-Synchronous Single Frequency Filtering Spectrogram for Speech Emotion Recognition0
Privacy against Real-Time Speech Emotion Detection via Acoustic Adversarial Evasion of Machine Learning0
Probing Speech Emotion Recognition Transformers for Linguistic Knowledge0
Enrolment-based personalisation for improving individual-level fairness in speech emotion recognitionCode0
Show:102550
← PrevPage 7 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Vertically long patch ViTAccuracy94.07Unverified
2ConformerXL-PAccuracy88.2Unverified
3CoordViTAccuracy82.96Unverified
4SepTr + LeRaCAccuracy70.95Unverified
5SepTrAccuracy70.47Unverified
6ResNet-18 + SPELAccuracy68.12Unverified
7ViTAccuracy67.81Unverified
8ResNet-18 + PyNADAAccuracy65.15Unverified
9GRUAccuracy55.01Unverified
#ModelMetricClaimedVerifiedStatus
1SER with MTLUA CV0.78Unverified
2emoDARTSUA CV0.77Unverified
3LSTM+FCWA0.76Unverified
4TAPWA CV0.74Unverified
5SYSCOMB: BLSTMATT with CSA (session5)UA0.74Unverified
6Partially Fine-tuned HuBERT LargeWA CV0.73Unverified
7CNN - DARTSUA0.7Unverified
8CNN+LSTMUA0.65Unverified
#ModelMetricClaimedVerifiedStatus
1VQ-MAE-S-12 (Frame) + Query2EmoAccuracy84.1Unverified
2CNN-X (Shallow CNN)Accuracy82.99Unverified
3xlsr-Wav2Vec2.0(FineTuning)Accuracy81.82Unverified
4CNN-14 (Fine-Tuning)Accuracy76.58Unverified
5AlexNet (FineTuning)Accuracy61.67Unverified
#ModelMetricClaimedVerifiedStatus
1wav2small-TeacherCCC0.76Unverified
2wavlmCCC0.75Unverified
3w2v2-L-robust-12CCC0.75Unverified
4preCPCCCC0.71Unverified
#ModelMetricClaimedVerifiedStatus
1wav2small-TeacherCCC0.68Unverified
2wavlmCCC0.67Unverified
3w2v2-L-robust-12CCC0.66Unverified
4preCPCCCC0.64Unverified
#ModelMetricClaimedVerifiedStatus
1wav2small-TeacherCCC0.68Unverified
2wavlmCCC0.65Unverified
3w2v2-L-robust-12CCC0.64Unverified
4preCPCCCC0.38Unverified
#ModelMetricClaimedVerifiedStatus
1DAWN-hidden-SVMUnweighted Accuracy (UA)32.1Unverified
2Wav2Small-VAD-SVMUnweighted Accuracy (UA)23.3Unverified
3Speechbrain Wav2Vec2Unweighted Accuracy (UA)20.7Unverified
#ModelMetricClaimedVerifiedStatus
1emotion2vec+baseWeighted Accuracy (WA)79.4Unverified
2emotion2vec+largeWeighted Accuracy (WA)69.5Unverified
3emotion2vecWeighted Accuracy (WA)64.75Unverified
#ModelMetricClaimedVerifiedStatus
1Dusha baselineMacro F10.77Unverified
#ModelMetricClaimedVerifiedStatus
1Dusha baselineMacro F10.54Unverified
#ModelMetricClaimedVerifiedStatus
1VGG-optiVMD1:1 Accuracy96.09Unverified
#ModelMetricClaimedVerifiedStatus
1VQ-MAE-S-12 (Frame) + Query2EmoAccuracy90.2Unverified
#ModelMetricClaimedVerifiedStatus
1PyResNetUnweighted Accuracy (UA)0.43Unverified
#ModelMetricClaimedVerifiedStatus
1emoDARTSUA0.66Unverified
#ModelMetricClaimedVerifiedStatus
1LSTMCCC (Arousal)0.76Unverified
#ModelMetricClaimedVerifiedStatus
1CNN (1D)Unweighted Accuracy65.2Unverified