SOTAVerified

Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction

2025-02-18Code Available0· sign in to hype

Nils Constantin Hellwig, Jakob Fehle, Udo Kruschwitz, Christian Wolff

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Aspect sentiment quadruple prediction (ASQP) facilitates a detailed understanding of opinions expressed in a text by identifying the opinion term, aspect term, aspect category and sentiment polarity for each opinion. However, annotating a full set of training examples to fine-tune models for ASQP is a resource-intensive process. In this study, we explore the capabilities of large language models (LLMs) for zero- and few-shot learning on the ASQP task across five diverse datasets. We report F1 scores slightly below those obtained with state-of-the-art fine-tuned models but exceeding previously reported zero- and few-shot performance. In the 40-shot setting on the Rest16 restaurant domain dataset, LLMs achieved an F1 score of 52.46, compared to 60.39 by the best-performing fine-tuned method MVP. Additionally, we report the performance of LLMs in target aspect sentiment detection (TASD), where the F1 scores were also close to fine-tuned models, achieving 66.03 on Rest16 in the 40-shot setting, compared to 72.76 with MVP. While human annotators remain essential for achieving optimal performance, LLMs can reduce the need for extensive manual annotation in ASQP tasks.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
ASQPGemma-3-27B (50-shot, self-consistency learning)F1 (R15)41.74Unverified
ASQPGemma-3-27B (10-shot, self-consistency learning)F1 (R15)39.95Unverified
TASDGemma-3-27B (50-shot, self-consistency learning)F1 (R15)62.12Unverified
TASDGemma-3-27B (10-shot, self-consistency learning)F1 (R15)54.37Unverified

Reproductions