ELECTRA and GPT-4o: Cost-Effective Partners for Sentiment Analysis
James P. Beno
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/jbeno/sentimentOfficialIn paperpytorch★ 7
Abstract
Bidirectional transformers excel at sentiment analysis, and Large Language Models (LLM) are effective zero-shot learners. Might they perform better as a team? This paper explores collaborative approaches between ELECTRA and GPT-4o for three-way sentiment classification. We fine-tuned (FT) four models (ELECTRA Base/Large, GPT-4o/4o-mini) using a mix of reviews from Stanford Sentiment Treebank (SST) and DynaSent. We provided input from ELECTRA to GPT as: predicted label, probabilities, and retrieved examples. Sharing ELECTRA Base FT predictions with GPT-4o-mini significantly improved performance over either model alone (82.50 macro F1 vs. 79.14 ELECTRA Base FT, 79.41 GPT-4o-mini) and yielded the lowest cost/performance ratio (\0.12/F1 point). However, when GPT models were fine-tuned, including predictions decreased performance. GPT-4o FT-M was the top performer (86.99), with GPT-4o-mini FT close behind (86.70) at much less cost (\0.38 vs. \$1.59/F1 point). Our results show that augmenting prompts with predictions from fine-tuned encoders is an efficient way to boost performance, and a fine-tuned GPT-4o-mini is nearly as good as GPT-4o FT at 76% less cost. Both are affordable options for projects with limited resources.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| DynaSent | ELECTRA Base Fine-Tuned | Macro F1 | 71.83 | — | Unverified |
| DynaSent | GPT-4o-mini + ELECTRA Base FT | Macro F1 | 76.19 | — | Unverified |
| DynaSent | ELECTRA Large Fine-Tuned | Macro F1 | 76.29 | — | Unverified |
| DynaSent | GPT-4o-mini (Prompt) | Macro F1 | 77.35 | — | Unverified |
| DynaSent | GPT-4o + ELECTRA Large FT | Macro F1 | 77.69 | — | Unverified |
| DynaSent | GPT-4o Fine-Tuned (Minimal) | Macro F1 | 89 | — | Unverified |
| DynaSent | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) | Macro F1 | 77.94 | — | Unverified |
| DynaSent | GPT-4o-mini + ELECTRA Large FT (Prompt, Label, Probabilities) | Macro F1 | 79.72 | — | Unverified |
| DynaSent | GPT-4o (Prompt) | Macro F1 | 80.22 | — | Unverified |
| DynaSent | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1 | 81.53 | — | Unverified |
| DynaSent | GPT-4o-mini Fine-Tuned | Macro F1 | 86.9 | — | Unverified |
| Sentiment Merged | ELECTRA Large Fine-Tuned | Macro F1 | 82.36 | — | Unverified |
| Sentiment Merged | GPT-4o Fine-Tuned (Minimal) | Macro F1 | 86.99 | — | Unverified |
| Sentiment Merged | GPT-4o-mini Fine-Tuned | Macro F1 | 86.77 | — | Unverified |
| Sentiment Merged | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) | Macro F1 | 83.49 | — | Unverified |
| Sentiment Merged | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1 | 83.09 | — | Unverified |
| Sentiment Merged | GPT-4o-mini + ELECTRA Base FT (Prompt, Label) | Macro F1 | 82.74 | — | Unverified |
| Sentiment Merged | GPT-4o + ELECTRA Large FT (Prompt, Label) | Macro F1 | 81.57 | — | Unverified |
| Sentiment Merged | GPT-4o (Prompt) | Macro F1 | 80.14 | — | Unverified |
| Sentiment Merged | GPT-4o-mini (Prompt) | Macro F1 | 79.52 | — | Unverified |
| Sentiment Merged | ELECTRA Base Fine-Tuned | Macro F1 | 79.29 | — | Unverified |
| SST-3 | GPT-4o-mini Fine-Tuned | Macro F1 | 75.68 | — | Unverified |
| SST-3 | ELECTRA Base Fine-Tuned | Macro F1 | 69.95 | — | Unverified |
| SST-3 | GPT-4o-mini (Prompt) | Macro F1 | 70.67 | — | Unverified |
| SST-3 | ELECTRA Large Fine-Tuned | Macro F1 | 70.9 | — | Unverified |
| SST-3 | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) | Macro F1 | 70.99 | — | Unverified |
| SST-3 | GPT-4o-mini + ELECTRA Base FT | Macro F1 | 71.72 | — | Unverified |
| SST-3 | GPT-4o-mini + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1 | 71.98 | — | Unverified |
| SST-3 | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1 | 72.06 | — | Unverified |
| SST-3 | GPT-4o (Prompt) | Macro F1 | 72.2 | — | Unverified |
| SST-3 | GPT-4o + ELECTRA Large FT | Macro F1 | 72.94 | — | Unverified |
| SST-3 | GPT-4o Fine-Tuned (Minimal) | Macro F1 | 73.99 | — | Unverified |