SOTAVerified

KEVER^2: Knowledge-Enhanced Visual Emotion Reasoning and Retrieval

2025-05-30Unverified0· sign in to hype

Fanhang Man, Xiaoyue Chen, Huandong Wang, Baining Zhao, Han Li, Xinlei Chen, Yong Li

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Understanding what emotions images evoke in their viewers is a foundational goal in human-centric visual computing. While recent advances in vision-language models (VLMs) have shown promise for visual emotion analysis (VEA), several key challenges remain unresolved. Emotional cues in images are often abstract, overlapping, and entangled, making them difficult to model and interpret. Moreover, VLMs struggle to align these complex visual patterns with emotional semantics due to limited supervision and sparse emotional grounding. Finally, existing approaches lack structured affective knowledge to resolve ambiguity and ensure consistent emotional reasoning across diverse visual domains. To address these limitations, we propose K-EVER2, a knowledge-enhanced framework for emotion reasoning and retrieval. Our approach introduces a semantically structured formulation of visual emotion cues and integrates external affective knowledge through multimodal alignment. Without relying on handcrafted labels or direct emotion supervision, K-EVER2 achieves robust and interpretable emotion predictions across heterogeneous image types. We validate our framework on three representative benchmarks, Emotion6, EmoSet, and M-Disaster, covering social media imagery, human-centric scenes, and disaster contexts. K-EVER2 consistently outperforms strong CNN and VLM baselines, achieving up to a 19\% accuracy gain for specific emotions and a 12.3\% average accuracy gain across all emotion categories. Our results demonstrate a scalable and generalizable solution for advancing emotional understanding of visual content.

Tasks

Reproductions