SOTAVerified

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models

2025-04-10Code Available2· sign in to hype

Yuxiang Lin, Jingdong Sun, Zhi-Qi Cheng, Jue Wang, Haomin Liang, Zebang Cheng, Yifei Dong, Jun-Yan He, Xiaojiang Peng, Xian-Sheng Hua

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Most existing emotion analysis emphasizes which emotion arises (e.g., happy, sad, angry) but neglects the deeper why. We propose Emotion Interpretation (EI), focusing on causal factors-whether explicit (e.g., observable objects, interpersonal interactions) or implicit (e.g., cultural context, off-screen events)-that drive emotional responses. Unlike traditional emotion recognition, EI tasks require reasoning about triggers instead of mere labeling. To facilitate EI research, we present EIBench, a large-scale benchmark encompassing 1,615 basic EI samples and 50 complex EI samples featuring multifaceted emotions. Each instance demands rationale-based explanations rather than straightforward categorization. We further propose a Coarse-to-Fine Self-Ask (CFSA) annotation pipeline, which guides Vision-Language Models (VLLMs) through iterative question-answer rounds to yield high-quality labels at scale. Extensive evaluations on open-source and proprietary large language models under four experimental settings reveal consistent performance gaps-especially for more intricate scenarios-underscoring EI's potential to enrich empathetic, context-aware AI applications. Our benchmark and methods are publicly available at: https://github.com/Lum1104/EIBench, offering a foundation for advanced multimodal causal analysis and next-generation affective computing.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
EIBenchQwen-VL-ChatRecall26.45Unverified
EIBenchClaude-3-haikuRecall63.24Unverified
EIBenchLLaVA-1.5 (13B)Recall54.37Unverified
EIBenchLLaVA-NEXT (13B)Recall54.33Unverified
EIBenchClaude-3-sonnetRecall54.1Unverified
EIBenchLLaVA-NEXT (7B)Recall53.82Unverified
EIBenchMiniGPT-v2Recall52.89Unverified
EIBenchChatGPT-4oRecall49.99Unverified
EIBenchVideo-LLaVARecall49.26Unverified
EIBenchLLaVA-NEXT (34B)Recall49.03Unverified
EIBenchChatGPT-4VRecall46.86Unverified
EIBenchOtterRecall42.81Unverified
EIBenchQwen-vl-plusRecall31Unverified
EIBench (complex)ChatGPT-4oRecall39.27Unverified
EIBench (complex)LLaVA-NEXT (13B)Recall39.16Unverified
EIBench (complex)LLaVA-NEXT (7B)Recall38.71Unverified
EIBench (complex)LLaVA-1.5 (13B)Recall38.1Unverified
EIBench (complex)LLaVA-NEXT (34B)Recall35.37Unverified
EIBench (complex)MiniGPT-v2Recall35.1Unverified
EIBench (complex)Video-LLaVARecall30.9Unverified
EIBench (complex)ChatGPT-4VRecall28Unverified
EIBench (complex)OtterRecall27.9Unverified
EIBench (complex)Claude-3-haikuRecall24Unverified
EIBench (complex)Qwen-VL-ChatRecall22Unverified
EIBench (complex)Claude-3-sonnetRecall21.37Unverified
EIBench (complex)Qwen-vl-plusRecall20.37Unverified

Reproductions