SOTAVerified

ELECTRA and GPT-4o: Cost-Effective Partners for Sentiment Analysis

2024-12-29Code Available0· sign in to hype

James P. Beno

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Bidirectional transformers excel at sentiment analysis, and Large Language Models (LLM) are effective zero-shot learners. Might they perform better as a team? This paper explores collaborative approaches between ELECTRA and GPT-4o for three-way sentiment classification. We fine-tuned (FT) four models (ELECTRA Base/Large, GPT-4o/4o-mini) using a mix of reviews from Stanford Sentiment Treebank (SST) and DynaSent. We provided input from ELECTRA to GPT as: predicted label, probabilities, and retrieved examples. Sharing ELECTRA Base FT predictions with GPT-4o-mini significantly improved performance over either model alone (82.50 macro F1 vs. 79.14 ELECTRA Base FT, 79.41 GPT-4o-mini) and yielded the lowest cost/performance ratio (\0.12/F1 point). However, when GPT models were fine-tuned, including predictions decreased performance. GPT-4o FT-M was the top performer (86.99), with GPT-4o-mini FT close behind (86.70) at much less cost (\0.38 vs. \$1.59/F1 point). Our results show that augmenting prompts with predictions from fine-tuned encoders is an efficient way to boost performance, and a fine-tuned GPT-4o-mini is nearly as good as GPT-4o FT at 76% less cost. Both are affordable options for projects with limited resources.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
DynaSentELECTRA Base Fine-TunedMacro F171.83Unverified
DynaSentGPT-4o-mini + ELECTRA Base FTMacro F176.19Unverified
DynaSentELECTRA Large Fine-TunedMacro F176.29Unverified
DynaSentGPT-4o-mini (Prompt)Macro F177.35Unverified
DynaSentGPT-4o + ELECTRA Large FTMacro F177.69Unverified
DynaSentGPT-4o Fine-Tuned (Minimal)Macro F189Unverified
DynaSentGPT-4o-mini + ELECTRA Large FT (Prompt, Label)Macro F177.94Unverified
DynaSentGPT-4o-mini + ELECTRA Large FT (Prompt, Label, Probabilities)Macro F179.72Unverified
DynaSentGPT-4o (Prompt)Macro F180.22Unverified
DynaSentGPT-4o + ELECTRA Large FT (Prompt, Label, Examples)Macro F181.53Unverified
DynaSentGPT-4o-mini Fine-TunedMacro F186.9Unverified
Sentiment MergedELECTRA Large Fine-TunedMacro F182.36Unverified
Sentiment MergedGPT-4o Fine-Tuned (Minimal)Macro F186.99Unverified
Sentiment MergedGPT-4o-mini Fine-TunedMacro F186.77Unverified
Sentiment MergedGPT-4o-mini + ELECTRA Large FT (Prompt, Label)Macro F183.49Unverified
Sentiment MergedGPT-4o + ELECTRA Large FT (Prompt, Label, Examples)Macro F183.09Unverified
Sentiment MergedGPT-4o-mini + ELECTRA Base FT (Prompt, Label)Macro F182.74Unverified
Sentiment MergedGPT-4o + ELECTRA Large FT (Prompt, Label)Macro F181.57Unverified
Sentiment MergedGPT-4o (Prompt)Macro F180.14Unverified
Sentiment MergedGPT-4o-mini (Prompt)Macro F179.52Unverified
Sentiment MergedELECTRA Base Fine-TunedMacro F179.29Unverified
SST-3GPT-4o-mini Fine-TunedMacro F175.68Unverified
SST-3ELECTRA Base Fine-TunedMacro F169.95Unverified
SST-3GPT-4o-mini (Prompt)Macro F170.67Unverified
SST-3ELECTRA Large Fine-TunedMacro F170.9Unverified
SST-3GPT-4o-mini + ELECTRA Large FT (Prompt, Label)Macro F170.99Unverified
SST-3GPT-4o-mini + ELECTRA Base FTMacro F171.72Unverified
SST-3GPT-4o-mini + ELECTRA Large FT (Prompt, Label, Examples)Macro F171.98Unverified
SST-3GPT-4o + ELECTRA Large FT (Prompt, Label, Examples)Macro F172.06Unverified
SST-3GPT-4o (Prompt)Macro F172.2Unverified
SST-3GPT-4o + ELECTRA Large FTMacro F172.94Unverified
SST-3GPT-4o Fine-Tuned (Minimal)Macro F173.99Unverified

Reproductions