Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models

2024-10-04Code Available0· sign in to hype

Pavel Stepachev, Pinzhen Chen, Barry Haddow

Code Available — Be the first to reproduce this paper.

Code

github.com/rggdmonk/GenSEC-Task-3
Officialnone★ 2

Abstract

Large language models (LLMs) have started to play a vital role in modelling speech and text. To explore the best use of context and multiple systems' outputs for post-ASR speech emotion prediction, we study LLM prompting on a recent task named GenSEC. Our techniques include ASR transcript ranking, variable conversation context, and system output fusion. We show that the conversation context has diminishing returns and the metric used to select the transcript for prediction is crucial. Finally, our best submission surpasses the provided baseline by 20% in absolute accuracy.

Tasks

Emotion Recognition Prediction

Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models

Code

Abstract

Tasks

Reproductions