Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction

2025-02-18Code Available0· sign in to hype

Nils Constantin Hellwig, Jakob Fehle, Udo Kruschwitz, Christian Wolff

Code Available — Be the first to reproduce this paper.

Code

github.com/NilsHellwig/llm-prompting-asqp
pytorch★ 2

Abstract

Aspect sentiment quadruple prediction (ASQP) facilitates a detailed understanding of opinions expressed in a text by identifying the opinion term, aspect term, aspect category and sentiment polarity for each opinion. However, annotating a full set of training examples to fine-tune models for ASQP is a resource-intensive process. In this study, we explore the capabilities of large language models (LLMs) for zero- and few-shot learning on the ASQP task across five diverse datasets. We report F1 scores slightly below those obtained with state-of-the-art fine-tuned models but exceeding previously reported zero- and few-shot performance. In the 40-shot setting on the Rest16 restaurant domain dataset, LLMs achieved an F1 score of 52.46, compared to 60.39 by the best-performing fine-tuned method MVP. Additionally, we report the performance of LLMs in target aspect sentiment detection (TASD), where the F1 scores were also close to fine-tuned models, achieving 66.03 on Rest16 in the 40-shot setting, compared to 72.76 with MVP. While human annotators remain essential for achieving optimal performance, LLMs can reduce the need for extensive manual annotation in ASQP tasks.

Tasks

Aspect-Based Sentiment Analysis (ABSA)Few-Shot Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ASQP	Gemma-3-27B (50-shot, self-consistency learning)	F1 (R15)	41.74	—	Unverified
ASQP	Gemma-3-27B (10-shot, self-consistency learning)	F1 (R15)	39.95	—	Unverified
TASD	Gemma-3-27B (50-shot, self-consistency learning)	F1 (R15)	62.12	—	Unverified
TASD	Gemma-3-27B (10-shot, self-consistency learning)	F1 (R15)	54.37	—	Unverified

Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction

Code

Abstract

Tasks

Benchmark Results

Reproductions