SOTAVerified

Multimodal Emotion Recognition

This is a leaderboard for multimodal emotion recognition on the IEMOCAP dataset. The modality abbreviations are A: Acoustic T: Text V: Visual

Please include the modality in the bracket after the model name.

All models must use standard five emotion categories and are evaluated in standard leave-one-session-out (LOSO). See the papers for references.

Papers

Showing 125 of 180 papers

TitleStatusHype
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement LearningCode5
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion RecognitionCode4
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction TuningCode4
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion RecognitionCode3
PHemoNet: A Multimodal Network for Physiological SignalsCode2
MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised LearningCode2
Hierarchical Hypercomplex Network for Multimodal Emotion RecognitionCode2
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wildCode2
Recent Trends of Multimodal Affective Computing: A Survey from NLP PerspectiveCode2
GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion RecognitionCode1
Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalitiesCode1
Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion RecognitionCode1
GA2MIF: Graph and Attention Based Two-Stage Multi-Source Information Fusion for Conversational Emotion DetectionCode1
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion RecognitionCode1
FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition InferenceCode1
Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion RecognitionCode1
Decoupled Multimodal Distilling for Emotion RecognitionCode1
Cross Attentional Audio-Visual Fusion for Dimensional Emotion RecognitionCode1
Cooperative Sentiment Agents for Multimodal Sentiment AnalysisCode1
Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion RecognitionCode1
Emotion Recognition in Audio and Video Using Deep Neural NetworksCode1
DialogueRNN: An Attentive RNN for Emotion Detection in ConversationsCode1
A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognitionCode1
CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion RecognitionCode1
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion RecognitionCode1
Show:102550
← PrevPage 1 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GraphSmileWeighted F186.52Unverified
2JoyfulWeighted F185.7Unverified
3COGMENWeighted F184.5Unverified
4DANNAccuracy82.7Unverified
5MMERAccuracy81.7Unverified
6PATHOSnet v2Accuracy80.4Unverified
7Self-attention weight correction (A+T)Accuracy76.8Unverified
8CHFusionAccuracy76.5Unverified
9bc-LSTMWeighted F174.1Unverified
10Audio + Text (Stage III)F170.5Unverified
#ModelMetricClaimedVerifiedStatus
1GraphSmileWeighted F166.71Unverified
2Audio + Text (Stage III)Weighted F165.8Unverified
3JoyfulWeighted F161.77Unverified
#ModelMetricClaimedVerifiedStatus
1GraphSmileWeighted F172.81Unverified
2JoyfulWeighted F170.5Unverified
#ModelMetricClaimedVerifiedStatus
1GraphSmileWeighted F144.93Unverified
#ModelMetricClaimedVerifiedStatus
1GraphSmileWeighted F166.73Unverified
#ModelMetricClaimedVerifiedStatus
1SMPLify-Xv2v error52.9Unverified
#ModelMetricClaimedVerifiedStatus
1GraphSmileWeighted F174.31Unverified