SOTAVerified

Prompt Engineering

Prompt engineering is the process of designing and refining the prompts used to generate text from language models, such as GPT-3 or similar models. The goal of prompt engineering is to improve the quality and relevance of the generated text by carefully crafting the prompts to elicit the desired responses from the model.

Prompt engineering involves several steps, including selecting the appropriate model architecture and parameters, designing the prompt format and structure, selecting the appropriate task and training data, and fine-tuning the model using the selected prompt and data.

Prompt engineering is a crucial step in the development of language models, as it can greatly influence the quality and effectiveness of the model's responses. By carefully designing and refining the prompts used to generate text, researchers and developers can improve the accuracy and relevance of the model's output, making it more useful for a wide range of applications, including chatbots, language translation, content creation, and more.

Papers

Showing 11511200 of 1236 papers

TitleStatusHype
Grade Score: Quantifying LLM Performance in Option SelectionCode0
Comprehensive Evaluation and Insights into the Use of Large Language Models in the Automation of Behavior-Driven Development Acceptance Test FormulationCode0
VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative AdversaryCode0
Behavioral Augmentation of UML Class Diagrams: An Empirical Study of Large Language Models for Method GenerationCode0
TSCLIP: Robust CLIP Fine-Tuning for Worldwide Cross-Regional Traffic Sign RecognitionCode0
Comparative Study of Multilingual Idioms and Similes in Large Language ModelsCode0
Generalizing Segmentation Foundation Model Under Sim-to-real Domain-shift for Guidewire Segmentation in X-ray FluoroscopyCode0
Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production ChallengesCode0
FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data PruningCode0
Are Large Language Models Table-based Fact-Checkers?Code0
COMMA: Co-Articulated Multi-Modal LearningCode0
UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt -- A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological DiagnosisCode0
Exploring the Synergy Between Vision-Language Pretraining and ChatGPT for Artwork Captioning: A Preliminary StudyCode0
The Impact of Prompt Programming on Function-Level Code GenerationCode0
Brevity is the soul of sustainability: Characterizing LLM response lengthsCode0
Segmentation by registration-enabled SAM prompt engineering using five reference imagesCode0
Adapting PromptORE for Modern History: Information Extraction from Hispanic Monarchy Documents of the XVIth CenturyCode0
PRE: Vision-Language Prompt Learning with Reparameterization EncoderCode0
Bandit-Based Prompt Design Strategy Selection Improves Prompt OptimizersCode0
Self-Augmented In-Context Learning for Unsupervised Word TranslationCode0
Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code TranslationCode0
Virtual Agents for Alcohol Use Counseling: Exploring LLM-Powered Motivational InterviewingCode0
Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized ModelsCode0
Exploring the Capabilities of Large Language Models for Generating Diverse Design SolutionsCode0
Exploring Prompting Large Language Models as Explainable MetricsCode0
Exploring GPT's Ability as a Judge in Music UnderstandingCode0
Self-Pluralising Culture Alignment for Large Language ModelsCode0
Self-Reflection Outcome is Sensitive to Prompt ConstructionCode0
Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak AttackingCode0
Combining Large Language Models with Tutoring System Intelligence: A Case Study in Caregiver Homework SupportCode0
Explanation Regeneration via Information BottleneckCode0
Evaluating improvements on using Large Language Models (LLMs) for property extraction in the Open Research Knowledge Graph (ORKG)Code0
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language ModelsCode0
Evaluating Contrastive Feedback for Effective User SimulationsCode0
Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal DataCode0
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias ElicitationCode0
A Zero-Shot LLM Framework for Automatic Assignment Grading in Higher EducationCode0
Adaptations of AI models for querying the LandMatrix database in natural languageCode0
ChatGPT-HealthPrompt. Harnessing the Power of XAI in Prompt-Based Healthcare Decision Support using ChatGPTCode0
Apollo: Zero-shot MultiModal Reasoning with Multiple ExpertsCode0
ChatGPT4PCG Competition: Character-like Level Generation for Science BirdsCode0
Evaluating ChatGPT-3.5 Efficiency in Solving Coding Problems of Different Complexity Levels: An Empirical AnalysisCode0
Characterizing Multimodal Long-form Summarization: A Case Study on Financial ReportsCode0
Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner MonologueCode0
Prompt Engineering for Transformer-based Chemical Similarity Search Identifies Structurally Distinct Functional AnaloguesCode0
Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language ModelsCode0
A Course Correction in Steerability Evaluation: Revealing Miscalibration and Side Effects in LLMsCode0
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language ModelsCode0
Automatic deductive coding in discourse analysis: an application of large language models in learning analyticsCode0
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language ModelsCode0
Show:102550
← PrevPage 24 of 25Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PromptKDHarmonic mean77.62Unverified
2Customized EnsembleHarmonic mean75.49Unverified
3MMRLHarmonic mean74.45Unverified
4MMRL++Harmonic mean74.44Unverified
5CoPromptHarmonic mean74.33Unverified
6HPT++Harmonic mean74.24Unverified
7HPTHarmonic mean74.17Unverified
8ProMetaRHarmonic mean74.09Unverified
9MetaPromptHarmonic mean74.02Unverified
10DePTHarmonic mean74.02Unverified
#ModelMetricClaimedVerifiedStatus
1PromptKDHarmonic mean97.77Unverified
2HPT++Harmonic mean96.96Unverified
3MMRL++Harmonic mean96.75Unverified
4MMRLHarmonic mean96.68Unverified
5HPTHarmonic mean96.65Unverified
6CoPromptHarmonic mean96.55Unverified
7MetaPromptHarmonic mean96.32Unverified
8DePTHarmonic mean96.28Unverified
9ProMetaRHarmonic mean96.16Unverified
10RPOHarmonic mean96.03Unverified
#ModelMetricClaimedVerifiedStatus
1PromptKDHarmonic mean77.94Unverified
2MMRL++Harmonic mean74.46Unverified
3HPT++Harmonic mean74.23Unverified
4MMRLHarmonic mean73.82Unverified
5CoPromptHarmonic mean72.79Unverified
6ProMetaRHarmonic mean72.31Unverified
7HPTHarmonic mean72.16Unverified
8PromptSRCHarmonic mean71.75Unverified
9DePTHarmonic mean71.09Unverified
10RPOHarmonic mean68.61Unverified
#ModelMetricClaimedVerifiedStatus
1MMRL++Harmonic mean91.94Unverified
2PromptKDHarmonic mean89.14Unverified
3HPT++Harmonic mean87.36Unverified
4MMRLHarmonic mean87.21Unverified
5CoPromptHarmonic mean85.84Unverified
6ProMetaRHarmonic mean85.3Unverified
7DePTHarmonic mean84.88Unverified
8HPTHarmonic mean84.82Unverified
9MetaPromptHarmonic mean83.38Unverified
10MaPLeHarmonic mean82.35Unverified
#ModelMetricClaimedVerifiedStatus
1PromptKDHarmonic mean45.17Unverified
2MMRL++Harmonic mean42.24Unverified
3HPT++Harmonic mean41.33Unverified
4MMRLHarmonic mean41.15Unverified
5DePTHarmonic mean40.73Unverified
6HPTHarmonic mean40.28Unverified
7ProMetaRHarmonic mean40.25Unverified
8PromptSRCHarmonic mean40.15Unverified
9CoPromptHarmonic mean39.76Unverified
10MetaPromptHarmonic mean38.24Unverified
#ModelMetricClaimedVerifiedStatus
1PromptKDHarmonic mean90.24Unverified
2HPTHarmonic mean87.16Unverified
3MMRL++Harmonic mean87.01Unverified
4MMRLHarmonic mean86.78Unverified
5ProMetaRHarmonic mean86.7Unverified
6DePTHarmonic mean86.46Unverified
7PromptSRCHarmonic mean85.95Unverified
8HPT++Harmonic mean85.85Unverified
9CoPromptHarmonic mean85.71Unverified
10MetaPromptHarmonic mean84.52Unverified
#ModelMetricClaimedVerifiedStatus
1PromptKDHarmonic mean97.15Unverified
2HPT++Harmonic mean96.91Unverified
3CoPromptHarmonic mean96.87Unverified
4MMRLHarmonic mean96.74Unverified
5HPTHarmonic mean96.71Unverified
6MaPLeHarmonic mean96.58Unverified
7MMRL++Harmonic mean96.51Unverified
8ProMetaRHarmonic mean96.49Unverified
9CoCoOpHarmonic mean96.43Unverified
10DePTHarmonic mean96.37Unverified
#ModelMetricClaimedVerifiedStatus
1PromptKDHarmonic mean83.13Unverified
2MMRL++Harmonic mean78.18Unverified
3MMRLHarmonic mean78.06Unverified
4DePTHarmonic mean77.79Unverified
5ProMetaRHarmonic mean76.72Unverified
6PromptSRCHarmonic mean76.58Unverified
7CoPromptHarmonic mean75.66Unverified
8HPT++Harmonic mean75.59Unverified
9HPTHarmonic mean75.57Unverified
10MetaPromptHarmonic mean75.48Unverified
#ModelMetricClaimedVerifiedStatus
1PromptKDHarmonic mean82.6Unverified
2CoPromptHarmonic mean81.31Unverified
3MMRL++Harmonic mean81.28Unverified
4MMRLHarmonic mean81.2Unverified
5HPT++Harmonic mean81.11Unverified
6DePTHarmonic mean81.06Unverified
7HPTHarmonic mean80.88Unverified
8ProMetaRHarmonic mean80.82Unverified
9MetaPromptHarmonic mean80.62Unverified
10PromptSRCHarmonic mean80.52Unverified
#ModelMetricClaimedVerifiedStatus
1PromptKDHarmonic mean86.1Unverified
2MMRLHarmonic mean83.89Unverified
3HPT++Harmonic mean83.81Unverified
4MMRL++Harmonic mean83.81Unverified
5ProMetaRHarmonic mean83.25Unverified
6HPTHarmonic mean83.16Unverified
7CoPromptHarmonic mean83.07Unverified
8PromptSRCHarmonic mean82.74Unverified
9DePTHarmonic mean82.46Unverified
10MetaPromptHarmonic mean81.35Unverified
#ModelMetricClaimedVerifiedStatus
1PromptKDHarmonic mean93.05Unverified
2CoPromptHarmonic mean91.4Unverified
3MaPLeHarmonic mean91.38Unverified
4ProMetaRHarmonic mean91.34Unverified
5MetaPromptHarmonic mean91.29Unverified
6DePTHarmonic mean91.22Unverified
7MMRL++Harmonic mean91.1Unverified
8PromptSRCHarmonic mean91.1Unverified
9HPT++Harmonic mean91.09Unverified
10MMRLHarmonic mean91.03Unverified
#ModelMetricClaimedVerifiedStatus
1POMPTop-1 accuracy %51.6Unverified
2MMRLTop-1 accuracy %51.2Unverified
3HPT++Top-1 accuracy %51.18Unverified
4MaPLeTop-1 accuracy %50.9Unverified
5PromptSRCTop-1 accuracy %50.9Unverified
6HPTTop-1 accuracy %50.85Unverified
7CoCoOpTop-1 accuracy %50.63Unverified
8CoPromptTop-1 accuracy %50.5Unverified
9CLIPTop-1 accuracy %47.77Unverified
#ModelMetricClaimedVerifiedStatus
1POMPTop-1 accuracy %77.9Unverified
2PromptSRCTop-1 accuracy %77.8Unverified
3MMRLTop-1 accuracy %77.53Unverified
4HPT++Top-1 accuracy %77.52Unverified
5CoPromptTop-1 accuracy %77.51Unverified
6HPTTop-1 accuracy %77.38Unverified
7MaPLeTop-1 accuracy %76.98Unverified
8CoCoOPTop-1 accuracy %76.18Unverified
9CLIPTop-1 accuracy %73.96Unverified
#ModelMetricClaimedVerifiedStatus
1POMPTop-1 accuracy %49.8Unverified
2PromptSRCTop-1 accuracy %49.55Unverified
3CoPromptTop-1 accuracy %49.43Unverified
4HPTTop-1 accuracy %49.36Unverified
5HPT++Top-1 accuracy %49.28Unverified
6MMRLTop-1 accuracy %49.17Unverified
7MaPLeTop-1 accuracy %49.15Unverified
8CoCoOpTop-1 accuracy %48.75Unverified
9CLIPTop-1 accuracy %46.15Unverified
#ModelMetricClaimedVerifiedStatus
1HPT++Top-1 accuracy %65.31Unverified
2HPTTop-1 accuracy %65.25Unverified
3MMRLTop-1 accuracy %64.47Unverified
4PromptSRCTop-1 accuracy %64.35Unverified
5CoCoOpTop-1 accuracy %64.07Unverified
6MaPLeTop-1 accuracy %64.07Unverified
7POMPTop-1 accuracy %63.8Unverified
8CLIPTop-1 accuracy %60.83Unverified
#ModelMetricClaimedVerifiedStatus
1POMPAccuracy25.3Unverified
2VPTAccuracy24.8Unverified