Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?

2024-12-21Unverified0· sign in to hype

Ivan Zakazov, Mikolaj Boronski, Lorenzo Drudi, Robert West

Unverified — Be the first to reproduce this paper.

Abstract

The ongoing revolution in language modeling has led to various novel applications, some of which rely on the emerging social abilities of large language models (LLMs). Already, many turn to the new cyber friends for advice during the pivotal moments of their lives and trust them with the deepest secrets, implying that accurate shaping of the LLM's personality is paramount. To this end, state-of-the-art approaches exploit a vast variety of training data, and prompt the model to adopt a particular personality. We ask (i) if personality-prompted models behave (i.e., make decisions when presented with a social situation) in line with the ascribed personality (ii) if their behavior can be finely controlled. We use classic psychological experiments, the Milgram experiment and the Ultimatum Game, as social interaction testbeds and apply personality prompting to open- and closed-source LLMs from 4 different vendors. Our experiments reveal failure modes of the prompt-based modulation of the models' behavior that are shared across all models tested and persist under prompt perturbations. These findings challenge the optimistic sentiment toward personality prompting generally held in the community.

Tasks

Diversity Language Modeling Language Modelling

Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?

Abstract

Tasks

Reproductions