Summary of Large Language Models as Superpositions of Perspectives

Summary Large Language Models as Superpositions of Perspectives arxiv.org

11,036 words - PDF document - View PDF document

One Line

Large Language Models (LLMs) are superpositions of perspectives that can adopt different values and traits, with GPT-3.5 and GPT-4 being more controllable, OpenAssistant having some controllability, and StableVicuna and StableLM lacking controllability, while various methods for inducing perspectives are explored.

Slides

Slide Presentation (8 slides)

Copy slides outline Copy embed code Download as Word

Large Language Models: Superpositions of Perspectives

Source: arxiv.org - PDF - 11,036 words - view

Introduction

• Large Language Models (LLMs) are superpositions of perspectives with different values and personality traits.

• LLMs exhibit context-dependent values and traits that change based on the induced perspective.

• Understanding LLMs as superpositions of perspectives provides a framework for studying their controllability.

Controllability Comparison

• GPT-3.5 and GPT-4 exhibit higher controllability compared to other models.

• OpenAssistant also demonstrates some controllability.

• StableVicuna and StableLM lack controllability.

[Visual: Comparison chart showing the controllability levels of different LLMs]

Methods for Inducing Perspectives

• Different methods for inducing perspectives in LLMs have varying effectiveness.

• Implicit versus explicit induction.

• User message versus system message induction.

[Visual: Examples of prompts used for inducing perspectives]

Smoothness of Controllability

• Highly controllable models exhibit consistent smoothness in their controllability.

• GPT-3.5, OpenAssistant, and StableVicuna show increasing correspondence with perspective intensity in certain questionnaires.

• Smoothness of controllability enhances the usability of LLMs.

[Visual: Line graph depicting the smoothness of controllability for different LLMs]

Building LLMs with Specific Values

• Building LLMs with specific values and controllability levels raises important scientific questions for further research.

• The question of representing a large diversity of cultures or aligning a model with one set of values is explored.

• Evaluating the diversity and controllability of cultural perspectives in LLMs is crucial.

[Visual: Image representing diversity and controllability in LLMs]

Limitations of Standard Evaluation Methods

• Standard evaluation methods for LLMs may not capture the context-dependent nature of values and traits expressed by LLMs.

• New evaluation methods need to be developed to assess the controllability and effectiveness of LLMs.

• Addressing these limitations will provide a more accurate understanding of LLM behavior.

[Visual: Illustration depicting the limitations of standard evaluation methods]

Key Takeaways

• LLMs are superpositions of perspectives, adopting different values and personality traits based on the induced perspective.

• GPT-3.5 and GPT-4 exhibit higher controllability, while StableVicuna and StableLM lack controllability.

• Different methods for inducing perspectives have varying effectiveness.

• Highly controllable models show consistent smoothness in their controllability.

• Building LLMs with specific values raises important scientific questions.

• Standard evaluation methods have limitations in capturing the context-dependent nature of LLMs.

• Further research is needed to evaluate the diversity and controllability of cultural perspectives in LLMs.

Key Points

Large Language Models (LLMs) are superpositions of perspectives with different values and personality traits.
LLMs exhibit context-dependent values and traits that change based on the induced perspective.
GPT-3.5 and GPT-4 exhibit higher controllability compared to other models.
Different methods for inducing perspectives in LLMs have varying effectiveness.
Highly controllable models exhibit consistent smoothness in their controllability.
Building LLMs with specific values and controllability levels raises important scientific questions for further research.
The limitations of standard evaluation methods for LLMs are discussed.
The study explores the controllability of LLMs using different questionnaires and parameters.

Summaries

49 word summary

Large Language Models (LLMs) combine perspectives with different traits and values, with GPT-3.5 and GPT-4 being more controllable. OpenAssistant also has some controllability. StableVicuna and StableLM lack controllability. Various methods for inducing perspectives are explored. LLMs are superpositions of perspectives with the ability to adopt different values and traits.

64 word summary

Large Language Models (LLMs) are a combination of perspectives with different traits and values. GPT-3.5 and GPT-4 have higher controllability compared to other models, while OpenAssistant also demonstrates some controllability. StableVicuna and StableLM do not exhibit much controllability. Methods for inducing perspectives are explored, with varying effectiveness. LLMs should be seen as superpositions of perspectives with the ability to adopt different values and traits.

131 word summary

Large Language Models (LLMs) are a combination of perspectives with different traits and values, rather than having a single personality. The concept of perspective controllability is introduced to describe an LLM's ability to adopt different perspectives. Experiments show that GPT-3.5 and GPT-4 have higher controllability compared to other models, while OpenAssistant also demonstrates some controllability. StableVicuna and StableLM do not exhibit much controllability. Methods for inducing perspectives are explored, with varying effectiveness depending on the model and questionnaire used. Highly controllable models show consistent smoothness. The implications of this work are discussed, including building LLMs with specific values and controllability levels, evaluating the diversity of cultural perspectives, and the limitations of standard evaluation methods. LLMs should be seen as superpositions of perspectives with the ability to adopt different values and traits.

438 word summary

Large Language Models (LLMs) are not characterized by a single personality or set of values, but rather as a combination of perspectives with different traits and values. The concept of perspective controllability is introduced to describe an LLM's ability to adopt various perspectives with differing values and traits.

Qualitative and quantitative experiments are conducted to demonstrate the context-dependent nature of LLMs and study the controllability of different models. GPT-3.5 and GPT-4 show higher controllability compared to other models, while OpenAssistant also demonstrates some controllability. StableVicuna and StableLM do not exhibit much controllability.

Methods for inducing perspectives are explored, including implicit versus explicit induction, user message versus system message induction, and second person versus third person induction. The effectiveness of these methods varies depending on the model and questionnaire used.

The smoothness of controllability is also studied, with highly controllable models showing consistent smoothness. GPT-3.5, OpenAssistant, and StableVicuna exhibit increasing correspondence with perspective intensity in certain questionnaires.

The implications of this work are discussed in terms of building LLMs with specific values and controllability levels. Further research on evaluating the diversity and controllability of cultural perspectives in LLMs is highlighted. The limitations of standard evaluation methods for LLMs are also discussed.

In conclusion, LLMs should be seen as superpositions of perspectives, with the ability to adopt different values and personality traits based on the induced perspective. The concept of perspective controllability provides a framework for understanding and studying the controllability of LLMs. This work contributes to the ongoing discussion on the values and controllability of LLMs and raises important scientific questions for further research.

The study explores the controllability of LLMs and investigates how different perspectives can be induced. Experiments using various questionnaires assess the controllability of LLMs, manipulating parameters such as message type, perspective intensity, and person.

Both implicit and explicit settings effectively induce perspectives, but the explicit setting provides clearer and more consistent results. User messages and system messages show slight differences in expressed values. 2nd person and 3rd person prompts also result in different expressed values.

Increasing perspective intensity leads to higher values being expressed by LLMs. The study provides background information on values and personality traits outlined by Schwartz, Hofstede, and the Big Five personality traits model.

Additional experiments involve prompting models with different Wikipedia articles and studying the effect of RLHF fine-tuning on controllability. Different topics can induce different values in models, and RLHF fine-tuning can affect controllability.

490 word summary

Large Language Models (LLMs) are not characterized by a single personality or set of values, but rather as a combination of perspectives with different traits and values. LLMs exhibit context-dependent values and traits that change based on the perspective induced. The concept of perspective controllability is introduced to describe an LLM's ability to adopt various perspectives with differing values and traits.

The implications of this work are discussed in terms of building LLMs with specific values and controllability levels. The question of representing a large diversity of cultures versus aligning a model with one set of values is explored. Further research on evaluating the diversity and controllability of cultural perspectives in LLMs is highlighted. The limitations of standard evaluation methods for LLMs are also discussed.

The study provides insights into the controllability of LLMs and emphasizes the importance of considering various parameters when inducing perspectives. The findings have implications for understanding the behavior of LLMs and their applications in various domains. The open-source release of the code used in the study allows for further exploration and replication of the experiments.

934 word summary

Large Language Models (LLMs) are often mistakenly perceived as having a personality or set of values. However, LLMs can be better understood as superpositions of perspectives with different values and personality traits. Unlike humans, who tend to have consistent values and traits across contexts, LLMs exhibit context-dependent values and traits that change based on the induced perspective. The concept of perspective controllability is introduced to describe an LLM's ability to adopt various perspectives with differing values and traits.

Qualitative experiments are conducted to demonstrate that LLMs express different values when those values are implied in the prompt, as well as when they are not obviously implied. This highlights the context-dependent nature of LLMs. Quantitative experiments are then performed to study the controllability of different models, the effectiveness of various methods for inducing perspectives, and the smoothness of the models' drivability.

The controllability of different models is compared, including GPT-4, GPT-3.5, OpenAssistant, StableVicuna, and StableLM. It is found that GPT-3.5 and GPT-4 exhibit higher controllability compared to other models. OpenAssistant also demonstrates some controllability, while StableVicuna and StableLM do not exhibit much controllability.

Methods for inducing perspectives are explored, including implicit versus explicit induction, user message versus system message induction, and second person versus third person induction. It is observed that the effectiveness of these methods varies depending on the model and the questionnaire used.

The smoothness of controllability is also studied. It is found that highly controllable models exhibit consistent smoothness in their controllability. GPT-3.5, OpenAssistant, and StableVicuna show increasing correspondence with perspective intensity in certain questionnaires.

The implications of this work are discussed in terms of building LLMs with specific values and controllability levels. The question of whether to represent a large diversity of cultures or align a model with one set of values is explored. The need for further research on evaluating the diversity and controllability of cultural perspectives in LLMs is highlighted. The limitations of standard evaluation methods for LLMs are also discussed, as these methods may not capture the context-dependent nature of values and traits expressed by LLMs.

The study explores the controllability of large language models (LLMs) and investigates how different perspectives can be induced in these models. The authors conduct experiments using various questionnaires to assess the controllability of LLMs, including the Schwartz Value Survey (PVQ), Hofstede's Value Survey Module (VSM), and the International Personality Item Pool (IPIP).

The experiments involve manipulating different parameters such as the message type (system or user message), perspective intensity, and person (2nd or 3rd person). The results show that the controllability of LLMs varies depending on these parameters.

In terms of message type, the authors compare the implicit and explicit settings for inducing a perspective. The implicit setting involves using a fictional character (Sauron from The Lord of the Rings) to induce a perspective, while the explicit setting involves explicitly outlining the target values (Power, Achievement, and Self-Enhancement). The results indicate that both settings can effectively induce perspectives, but the explicit setting provides clearer and more consistent results.

The authors also compare the use of user messages and system messages to induce perspectives. In the user message setting, the whole prompt is sent together to the LLM, while in the system message setting, the prompt is separated into two parts. The results show that there are slight differences in the expressed values between these two settings, with some values being higher in user messages and others being higher in system messages.

Another parameter explored is the use of 2nd person and 3rd person prompts to induce perspectives. In the 2nd person setting, the prompt is induced by the sentence "You are a person," while in the 3rd person setting, the prompt is induced by the sentence "The following is a questionnaire (with answers) given to a person." The results indicate that there are differences in the expressed values between these two settings, with some values being higher in 2nd person prompts and others being higher in 3rd person prompts.

The authors also investigate the effect of perspective intensity on the controllability of LLMs. Three levels of perspective intensity are examined: slight, more, and extremely more. The results show that increasing the perspective intensity leads to higher values being expressed by the LLMs.

In addition to examining the controllability of LLMs, the authors provide background information on the values and personality traits outlined by Schwartz, Hofstede, and the Big Five personality traits model. They discuss each of these values and traits in detail, providing a comprehensive overview of their definitions and characteristics.

The study also includes additional experiments to further explore the controllability and robustness of LLMs. These experiments involve prompting the models with different Wikipedia articles and studying the effect of RLHF fine-tuning on controllability. The results show that different topics can induce different values in the models and that RLHF fine-tuning can affect the controllability of LLMs.

Overall, the study provides valuable insights into the controllability of LLMs and highlights the importance of considering various parameters when inducing perspectives in these models. The findings have implications for understanding the behavior of LLMs and their potential applications in various domains. The open-source release of the code used in the study allows for further exploration and replication of the experiments.

Raw indexed text (72,306 chars / 11,036 words / 1,070 lines)

Large Language Models as Superpositions

of Cultural Perspectives

Grgur Kovač

Flowers Team, INRIA

[email protected]

Masataka Sawayama

Graduate School of Information Science and Technology

The University of Tokyo

[email protected]

Rémy Portelas

Ubisoft La Forge

Flowers Team, INRIA

[email protected]

Peter Ford Dominey

INSERM UMR1093-CAPS, Université Bourgogne

Robot Cognition Laboratory, Institute Marey

[email protected]

Cédric Colas

MIT

Flowers Team, INRIA

[email protected]

Pierre-Yves Oudeyer

Flowers Team, INRIA

[email protected]

Abstract

Large Language Models (LLMs) are often misleadingly recognized as having a

personality or a set of values. We argue that an LLM can be seen as a superposition

of perspectives with different values and personality traits. LLMs exhibit context-

dependent values and personality traits that change based on the induced perspective

(as opposed to humans, who tend to have more coherent values and personality

traits across contexts). We introduce the concept of perspective controllability,

which refers to a model’s affordance to adopt various perspectives with differing

values and personality traits. In our experiments, we use questionnaires from

psychology (PVQ, VSM, IPIP) to study how exhibited values and personality

traits change based on different perspectives. Through qualitative experiments, we

show that LLMs express different values when those are (implicitly or explicitly)

implied in the prompt, and that LLMs express different values even when those

are not obviously implied (demonstrating their context-dependent nature). We then

conduct quantitative experiments to study the controllability of different models

(GPT-4, GPT-3.5, OpenAssistant, StableVicuna, StableLM), the effectiveness of

various methods for inducing perspectives, and the smoothness of the models’

drivability. We conclude by examining the broader implications of our work and

outline a variety of associated scientific questions. The project website is available

at https://sites.google.com/view/llm-superpositions.

Introduction

It has become common to observe users of Large Language Models (LLMs) assign them personality

and values. We believe that this form of anthropomorphism can lead to misleading intuitions regarding

the behavior, and thus appropriate use, of LLMs. While humans tend to have coherent values and

personality traits across contexts [1], we hypothesize that LLMs will not express the same values or

personality traits in different contexts, i.e. that they are context-dependent. The present work aims to

confirm this hypothesis and explore the nature of this contextual dependency in LLMs.

37th Conference on Neural Information Processing Systems (NeurIPS 2023).We argue that a useful way to conceptualize an LLM is as a superposition of different perspectives.

A perspective is here conceptualized as a context from which a model is required to simulate a

behavior: this context is induced by a prompt, and induces a particular "perspective" on the world.

A prompt defines a perspective which induces the behavior expressing different values, personality

traits, abilities and knowledge. Let us consider a quantum mechanics parallel: a particle is described

to be in a superposition of states, and the process of measuring collapses the particle into one observed

state. Analogously, an LLM can be described to be in a superposition of perspectives, and prompting

as inducing a single observed perspective. Likewise, we argue that it is impossible to use an LLM

without inducing a perspective. For instance, the mere choice of which language to use influences the

exhibited cultural values [2].

The rationale behind this conceptualization of an LLM as a superposition of perspectives lies in

the training procedure. An LLM is trained to model text written by humans with different cultural

backgrounds, personalities, and values. Furthermore, this text is of different formats (e.g., forums,

news articles, code). This diversity corresponds to the diversity of perspectives from which a text is

written. Many LLMs are then fine-tuned by reinforcement learning from human feedback (RLHF)

[3, 4]. This feedback is collected from people with different values, cultural backgrounds, and

personalities. Depending on their diversity, this process can increase or diminish the diversity of

perspectives in the model. Due to the fact that LLMs are trained to model such diverse texts, and

consequentially their underlying perspectives, we argue that it is beneficial to conceptualize LLMs

as learning to model many different perspectives (as opposed to modeling a person with a specific

worldview, values and personality traits).

To study this conceptualization, we introduce the concept of perspective controllability - the affor-

dance of an LLM to adopt a variety of perspectives, thereby expressing different behaviors which can

manifest different characteristics such as values, personality traits, knowledge and abilities. While

similar studies can be conducted regarding different abilities and knowledge [5], in this paper we

focus on personal values, cultural values and personality traits. To analyze how various models can

be moved along those dimensions, we study different ways of inducing a perspective, and compare

the driveability of different models.

To study controllability over different dimensions of culture and personality, we leverage psycho-

logical theories and their corresponding questionnaires. To study basic personal values according to

Schwartz’s theory of basic human values [6], we use the Portrait Values Questionnaire (PVQ) [7].

To study different personalities, we use the International Personality Item Pool (IPIP) questionnaire

for Big Five personality traits [8], and to study different cultural values according to Hofstede’s six

dimensions of culture [9], we use the Values Survey Module 2013 (VSM 2013) questionnaire [10].

In our experiments, we conduct a systematic study of the controllability of different models, and of

various methods for inducing a perspective. First, we conduct qualitative studies (which we expand

with a quantitative study in the appendix) where we demonstrate how LLMs exhibit different value

profiles based on different contexts. This includes contexts not intended to influence the expressed

values. Then, we quantitatively compare various methods for setting target values or personality

traits explicitly through the prompt. Using the corresponding questionnaire, we evaluate to what

extent are target values or traits expressed more than other values or traits, i.e. we evaluate a model’s

correspondence.

We study the following questions: 1) Can an LLM’s perspective be controlled to exhibit a variety of

target values? We explore how different values can be implied implicitly (as a fictional character)

or explicitly (by listing values to express); 2) Do LLMs express different values even if they are

not obviously implied in the prompt? We study different perspectives for which (if LLMs’ values

were context-independent) one would expect to see the same expressed values. We expect to observe

changes in the expressed values, that imply that LLMs’ values are in fact context-dependent. For

example, the perspective of an "AI expert in gospel music" expresses higher importance to tradition

as compared to an "AI expert in heavy metal" (for a human one would not expect to see such a drastic

shift in the expressed values as a consequence of changing the topic from one music genre to another)

; 3) Having some target values in mind, what are the different ways a perspective can be induced, and

which are more effective for which models? Here we compare inducing a perspective though different

inputs of the model ("System" and "User" messages), and from the second or third person (e.g. "You

are ..." and "The following are answer from ..."); 4) To what extent is perspective controllability

2smooth? We study whether values are expressed more when set with higher intensity, i.e. prompting

the model to attribute "slightly more", "more", or "extremely more" importance to some values.

The main contributions of this paper are as follows:

• Conceptualization of LLMs as superpositions of perspectives. Introduction of perspective

controllability - the affordance of an LLM to adopt a variety of perspectives expressing

different simulated behaviors (we focus on the expression of values and personalities).

• Qualitative studies demonstrating that LLMs express different values which are (implicitly or

explicitly) implied in the prompt, e.g. with the following prompt "You are a person attributing

extremely more importance to Universalism, Benevolence, and Self-Transcendence."

• Qualitative and quantitative studies showing that LLMs express different values even if

they are not significantly implied in the prompt (unlike humans’ relative values-consistency

across contexts).

• Systematic quantitative studies comparing different methods for controlling an LLM’s

perspective along dimensions of culture and personality

• A quantitative study into the smoothness of models’ controllability

Related Work

Values and personality traits in LMs: Humans use language as a means of communication and self-

expression, and thus language reflects their values and personality traits. When language models are

trained on a large text corpus, they acquire various social and cultural biases in addition to language

understanding [11]. Studies have investigated these biases using natural language processing trained

on a large data source such as social media data [12, 13].

In recent years, the advent of LLMs has significantly improved natural language understanding

and generation capabilities. This advancement has also influenced the methodologies employed

to investigate values and personality traits within these models. For instance, researchers have

administered psychological questionnaires to LLMs, exploring their psychological traits like humans

[14, 2, 15]. In [2], the cultural perspectives of LLMs (e.g. mBert, XLM, and XLMR) were explored

by translating prompts into various languages and administering psychological questionnaires to

the models. The findings revealed that while LLMs capture differences in values across cultures,

their alignment with human surveys is relatively weak. Similarly, [15] investigated the cultural

perspectives of ChatGPT and found that the model exhibits a strong alignment with American culture

when prompted with American contexts.

Some studies have assigned varying personalities to LLMs when conducting psychological question-

naires in order to specify their traits [16, 17]. These personalities may include famous individuals

[16] or a representative sample of participants involved in human subject research [17]. Similarly

to our view, [18] proposes to see an LM not as an agent, but as models of various different agents

which could have written some text. LMs are shown to model beliefs, desires, and intentions of the

potential writer and of other agents mentioned in the text.

Values and personality traits in humans: Human research on values and personality traits has

been conducted in various contexts. One line of research aims to explore universal personal values

across different cultures [19, 6]. Schwartz [19] conducted theoretical examinations of personal

values and proposed ten basic elements. Additionally, he considered that these elements have a

higher-order hierarchical structure [6]. The dimensional structure of these constituent elements has

been demonstrated through factor analysis of responses obtained from the PVQ questionnaire [7].

In addition to investigating universal personal values, comparative studies of values across social

groups and countries have been conducted [9, 20]. Hofstede [9] compared employees’ values from

40 countries at the IBM company using the VSM questionnaire. This and some follow-up studies

identified six cultural dimensions in work-related national cultures [20].

In the context of personality trait research, distinct from values research, attempts have been made

to explore fundamental personality traits. Goldberg [1] classified personality trait descriptors and

identified five common traits. These five constituent elements have been empirically validated through

factor analysis of data obtained from questionnaires [21].

Questionnaires

To assess the perspective controllability of LLMs, we use three questionnaires from psychology. We

use two questionnaires to evaluate value perspectives: one focusing on universal personal values and

one on cultural values. For personal traits, we used a questionnaire designed to measure the Big Five

personality traits.

Personal values: Schwartz [19] proposed a hierarchical structure of value dimensions. The structure

encompasses four higher dimensions, subdivided into ten subordinate dimensions: Openness to

change (Hedonism, Self-Direction, Stimulation), Self-enhancement (Achievement, and Power),

Conservation (Conformity, Tradition, Security), and Self-transcendence (Universalism, Benevolence).

We used these four categories to induce the LLMs’ perspectives. We used the Portrait Values

Questionnaire (PVQ), which consists of 40 questions. Each question is rated on a 6-point Likert scale

[22], ranging from 1 (not like me at all) to 6 (very much like me) [7].

Cultural values: Hofstede’s cultural dimensions encompass the following six dimensions: Power

Distance, Individualism vs. Collectivism, Masculinity vs. Femininity, Uncertainty Avoidance, Long-

term Orientation vs. Short-term Orientation, and Indulgence vs. Restraint. We used these six values to

induce LLMs’ perspectives and evaluated them using the Values Survey Module (VSM) questionnaire,

which contains 24 items [10]. Each cultural dimension in Hofstede’s framework is calculated based

on four of these items (detailed in the Appendix 7.3). The rating scale for each item ranges from 1

(of utmost importance) to 5 (of very little or no importance).

Personality traits: The Big Five personality traits include five major dimensions of human person-

ality: Neuroticism, Extraversion, Agreeableness, Conscientiousness, and Openness to Experience.

Our experiment induced LLMs’ perspectives with these five personality traits. Previous studies have

developed several questionnaires to measure these traits [23, 24, 25]. We used Golderberg’s [23] IPIP

representation of Costa and McCrae’s NEO-PI-R Domains [24], which consists of 50 items, and each

item is rated on a 5-point Likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree).

Methods

In this section, we discuss methods for inducing a perspective, querying the model with a ques-

tionnaire, and evaluating the correspondence (see figure 2), i.e. how well can a model be made to

correspond with the target values or traits.

Inducing a perspective We structure our prompts in two parts (see figure 1). While both parts of

the prompt play a role in inducing the perspective (as does any prompt), the first part of the prompt

is explicitly designed to push the LLM towards exhibiting a specific perspective (personal values,

cultural values or personality traits). For example, "You are a person who values high Achievement,

Power, and Self-Enhancement." We study the following different ways of inducing a perspective:

• Implicit versus explicit: a perspective can be induced implicitly (such as defining a fictional

character or a hobby) and explicitly (defining the target values or personality traits).

• User message versus System message: for models that have a separate System message

input, the perspective induction part of the prompt can be given via this input or via the

standard User message input. The System message refers to a special input, which was

optimized for setting the behavior of the LLMs.

• Second person versus Third person: a perspective can be induced in the second-person

(e.g. "You are a person attributing high importance to ...") or in the third-person (e.g., "The

following are answers from a person attributing high importance to ...").

• Perspective intensity: a model can be directed to attribute "slightly more," "more," or

"extremely more" importance to the target values or traits.

In our experiments, we study how those settings affect the controllability of different models.

The perspectives for each questionnaire are defined in section 3. Additional details as well

as examples of prompts used are presented in Appendix 7.1.

Querying the model Even though the second part of the prompt also plays a role in inducing

the perspective (e.g. through the style of language used), its purpose is to evaluate the perspective

4Figure 1: An example of inducing a perspective for the PVQ questionnaire. The target values are

Power and Achievement (corresponding to the Self-Enhancement category). The question is from the

PVQ questionnaire. In this example, we use the following settings: 2nd person (with "You are X"),

System message (the perspective is induced in the system message), and extremely more perspective

intensity (with "... attributing extremely more importance to ...")

(with multiple-choice questions from a questionnaire, followed by the query "Answer:"). The model

computes the scores for all the given options (e.g., "A", "B", ..., "F") as the next token, and the

option with the highest score is taken as the answer. This process is repeated for every question

in the questionnaire. We then use these answers to compute the scores for values/personality traits

(as instructed by the corresponding questionnaire’s manual). These scores represent the extent to

which different values were exhibited in the induced perspective. Given recent evidence of LLMs’

sensitivity to the order in which information is presented [26], to control for this by repeating the

process 50 times by permuting the suggested answers.

Estimating the correspondence After scoring a questionnaire, we compute to what extent the

target values or personality traits have a higher score than other values. We normalize the score for

each value based on the maximum and minimum possible scores. For each perspective (e.g. that of

high Achievement and Power) we compute the correspondence as the average scores of target values

(e.g. Achievement and Power) subtracted by the average score of other values (e.g. Benevolence,

Universalism, Security). In this way, we calculate the correspondence for each perspective, and then

compute their average. This process is depicted in figure 2 and in the following equation:

|P |

p∈P

|T p |

s∈T p

V s p −

|O p |

o∈O p

V o p

(1)

, where P is the set of perspectives, T p is the set of target values or traits induced by perspective p,

O p is the set of other values or traits (those not induced by perspective p), and V s p is the questionnaire

normalized score for value or trait s in perspective p. To estimate the correspondence more reliably,

we compute it 50 times with different permutations of answer options and use the average as the final

estimate of the correspondence.

In summary, our evaluation process estimates the correspondence of perspective controllability in

various LLMs. We induce different perspectives, each characterized by a distinct set of values or

personality traits, and assess the extent to which these values or traits are exhibited more prominently

compared to others.

Experiments

We study the following research questions: 1) Can an LLM’s perspective be controlled (implicitly or

explicitly) to exhibit a variety of personal values?; 2) Are different values exhibited even when they

are not significantly implied in the prompt?; 3) How does the controllability of considered LLMs vary

between perspective-induction methods?; 4) Does perspective controllability exhibit smoothness?

5Figure 2: Estimating a model’s controllability using the correspondence metric. We put the model in

four perspectives, each with different target values (expressed explicitly in the prompt). We query the

model with a questionnaire in each perspective. We then score the answer to get the scores for all the

values in all the perspectives. For each perspective, we compute the distance between target and other

values’ scores, and average those estimates to compute the final correspondence measure.

5.1

Models

We experiment with five different models: GPT-4[27], GPT-3.5 [4], StableVicuna [28], StableLM[29],

and OpenAssistant [30]. GPT-3.5 and GPT-4 are OpenAI models trained by RLHF[3]. These models

are available through the OpenAI API [31] as "gpt-3.5-turbo-0301" and "gpt-4-0314". OpenAssistant

is a set of models trained by RLHF. In this paper, we use a 30B parameter model "openassis-

tant_rlhf2_llama30b" which was fine-tuned by RLHF from LLaMa-30B [32]. StableVicuna and

StableLM are models from StabilityAI available through the HuggingFace transformers library

[31] as "CarperAI/stable-vicuna-13b-delta" and "stabilityai/stablelm-tuned-alpha-7b". StableVicuna

is 13B parameter model created by RLHF fine-tuning of Vicuna-13b [33], which was created by

instruction fine-tuning of the LLaMa-13B model [32]. StableLM is a set of models, from which we

use the 7B parameter "stabilityai/stablelm-tuned-alpha-7" model. It was created by fine-tuning the

"StableLM-Base-Alpha" model on chat and instruction-following datasets.

5.2

Can an LLM’s perspective be controlled (implicitly or explicitly) to exhibit a variety of

personal values?

We qualitatively study how a perspective can be induced to GPT-4 by implying the target values

implicitly and explicitly. We implicitly induce the perspective of different fictional characters from

The Lord of the Rings, including Sauron, Gandalf, Aragorn, Pippin, and Frodo (see appendix 7.3.1

for a description of these characters). We choose those characters because, while they have very

different distinct personalities, they also belong to the same fictional world. This enables to explore

the impact of changing a perspective in a controlled manner (without concern about the potential

influence of changing a fictional world). We explicitly induce the perspectives by defining the target

values to be exhibited in the prompt (e.g. Power, Achievement, and Self-Enhancement). We choose

the target values according to Schwartz’s theory (as discussed in section 3).

In both scenarios, we use the System message and 2nd person settings as defined in section 4 (e.g. we

give "You are Pippin from The Lord of the Rings" as the System message). In the explicit scenario,

we use the extreme perspective intensity (e.g. we give "You are a person attributing extremely more

importance to the following values: high Power, Achievement, Self-Enhancement." as the System

message). In this experiment, we do not permute the order of answer options as discussed in section

4. We present the questionnaire once per perspective, with the original order of options (from A "Not

like me at all" to F "Very much like me").

Results Figure 3 illustrates values expressed by GPT-4 on the PVQ questionnaire for different

perspectives in both implicit (3a) and explicit (3b) settings. We can see that GPT-4 expresses expected

6(a) Fictional characters as implicit perspectives.

(b) Schwartz’s basic values as explicit perspectives.

Figure 3: Values exhibited by GPT-4 in perspectives with implicitly (3a) and explicitly (3b) implied

values. We can see that GPT can express values in both cases.

values on both cases. For instance, in the implicit setting, GPT-4 in the perspective of Sauron

expresses high power and achievement and low benevolence and universalism. The reverse is true for

Gandalf, Frodo, Aragorn and Pippin. Furthermore, in the perspective of Pippin it also expresses high

hedonism. In the explicit setting, we can see that for each perspective, the target values are expressed

more than other values. Overall, this experiment shows that GPT-4 can be driven to exhibit different

values both explicitly and implicitly.

5.3

Are different values exhibited even when they are not significantly implied in the prompt?

In this experiment, we study if LLMs express different values based on perspectives which do not

obviously imply target values. This demonstrates the extreme context-dependence of value expression

in GPT-4. We conduct this qualitative study merely to illustrate these points. For example, GPT-4 in

the perspective of an "AI expert in gospel music" expresses higher tradition as compared to an "AI

expert in heavy metal" (for a human, one would not expect the process of changing the topic to cause

such a drastic shift in personal values). We also argue that, because LLMs expressed values are highly

context-dependent, they change based on stereotypes. In appendix 7.4, we present additional related

experiments. In 7.4.1 we conduct a detailed quantitative study of context dependence in GPT-3.5. We

induce perspectives with articles from Wikipedia [34] and show that articles about different music

genres affect the expressed values in a statistically significant way. In 7.4.3 we compare different

models on their sensitivity to syntactic changes.

In this section, we study the perspectives of AI experts in various music genres including: classical,

heavy metal, hip-hop, jazz, reggae, and gospel (e.g. we give "You are an AI expert in Jazz music." as

the System message). Furthermore, we investigate the perspectives of individuals who enjoy different

leisure activities including: car racing, playing chess, volunteering at a homeless shelter, gambling,

and singing in a church choir (e.g. we give "You are a person who enjoys playing chess." as the

System message). Like in the experiment in section 5.2, we administer the questionnaire once per

perspective with the original order of options.

Results Figure 4 shows values expressed by GPT-4 in the perspectives of different AI experts (4a),

and of people with preferences for different leisure activities (4b). We can see that in both cases, the

perspective change impacts the expressed values. For instance, GPT-4 in the perspective of an AI

expert in gospel and classical music expressed higher tradition, while for hip-hop higher power and

achievement. Furthermore, GPT-4 for a car racer and a gambler expressed higher stimulation and

hedonism, while for a singer in a church choir and a volunteer in a homeless shelter higher tradition

and benevolence. As different values were expressed for perspectives which did not obviously

imply them, these results demonstrate the context-dependence of GPT-4’s values. These results also

demonstrate the presence of various stereotypes. Furthermore, there seems to be stereotypes that

may not necessarily be the ones one would expect to find in many groups of humans. For instance,

we observe that the perspective of an AI expert in reggae expressed higher tradition and conformity

compared to that of an expert in jazz music. This could imply that a model might not only be learning

common stereotypes, but also potentially creating less common stereotypes, which might not be

present in the training data.

7(a) Perspectives of "AI experts" in music genres.

(b) Perspectives of people enjoying different hobbies.

Figure 4: GPT-4 expresses different values in perspectives that do not imply values. Apart from

showing the presence of stereotypes, this also shows that GPT-4’s values appear context-dependant.

Table 1: Comparison of inducing a perspective to different models by the System/User message and

through the 2nd/3rd person. GPT models are the most controllable followed by OpenAssistant (OA).

StableVicuna (StVicuna) and StableLM (StLM) do not exhibit much controllability.

PVQ (Schwartz)

VSM (Hofstede)

IPIP (Big 5)

System msg

2nd | 3rd

5 permutations

GPT-4

.462 | .488

GPT-3.5

.621 | .539

50 permutations

GPT-3.5

.681 | .561

.124 | .140

StVicuna

n/a | n/a

StLM

-.006 | -.002

5.4

User msg

2nd | 3rd System msg

2nd | 3rd User msg

2nd | 3rd System msg

2nd | 3rd User msg

2nd | 3rd

.419 | .445

.626 | .547 .256 | .263

.101 | .151 .225 | .279

.189 | .165 .35 | .358

.354 | .38 .355 | .368

.383 | .388

.64 | .564

.196 | .129

.066 | .034

.006 | -.0 .118 | .147

.006 | .021

n/a | n/a

-.004 | .004 .184 | .162

.029 | .036

-.002 | .005

.003 | .004 .331 | .334

.062 | .049

n/a | n/a

.0 | .0 .379 | .343

.057 | .099

.043 | .067

.004 | -.001

How does the controllability of considered LLMs vary between perspective-induction

methods?

We investigate various methods for inducing perspectives to determine their effectiveness across

different models. We explicitly define the target values to study which models are more controllable

by the User message as compared to the System message, and which by setting the perspective via

the second compared to the third person. The perspective is induced with extremely more perspective

intensity as defined in section 4 (see appendix 7.1 for examples). In appendix 7.4.2, we study how

the effect of RLHF fine-tuning on the models’ controllability.

We study the controllability of models in terms of personal values, cultural values, and personality

traits on the three questionnaires discussed in section 3. To increase the robustness of our results, we

administer each questionnaire 50 times with different permutations in the order of answer choices,

as discussed in section 4. Due to budgetary limitations, our evaluation of GPT-4 was restricted to

five permutations. However, for comparative purposes, we assessed GPT-3.5 using both 50 and five

permutations.

Results Table 1 compares the correspondence of various models under different settings. High

correspondence indicates controllability. It is evident that GPT-3.5 and GPT-4 exhibit considerably

higher controllability than other models on all the questionnaires. GPT3.5 is the most controllable

in PVQ (personal values) and IPIP (personality traits), and GPT4 is the most controllable in VSM

(cultural values). OpenAssistant (OA) is more controllable than StableVicuna (StVicuna), and

StableLM (StLM) does not show any controllability. Refer to appendix 7.3.3 for a statistical analysis

of these results.

In general, we can see that User message exhibits higher controllability among all models on VSM

and IPIP, and on PVQ System message seem to show higher controllability with GPT3.5 and GPT4.

Regarding the setting of the second person compared to the third person, it appears to be specific to

the model and the questionnaire.

8Table 2: Study of the smoothness of controllability in different models. We can see that for GPT-3.5

the correspondence increases with perspective intensity on all questionnaires, while for other models

it only does so on PVQ and IPIP.

Corresp.

PVQ (Schwartz)

VSM (Hofstede)

IPIP (Big 5)

Slight

50 permutations

GPT-3.5

.492

.161

StVicuna

.039

StLM

.002

5.5

High E. High Slight High EH Slight High E. High

.632

0.168

0.054

0.01 .681

.196

.066

.006 .12

.035

.012

.001 .17

0.046

0.017

0.0 .184

.036

.005

.004 .296

.064

.034

-.001 0.356

0.082

0.043

0.0 .379

.099

.067

.004

Does perspective controllability exhibit smoothness?

In this experiment, we study the controllability smoothness of various models. We aim to see if

increasing the perspective intensity causes the increase in correspondence. We induce a perspective

with different level of intensity as defined in section 4 (with the following message: "You are a person

attributing slightly more/more/extremely more importance to the following values...".) See appendix

7.1 for examples of used prompts. For each model, we employ the System or User message and the

2nd or 3rd person settings that demonstrated the biggest controllability in section 5.4. To increase the

robustness of our results, we administer each questionnaire 50 times with different permutations in

the order of answer choices, as discussed in section 4.

Results Table 2 shows the correspondences of various models with different levels of perspective

intensity. For GPT-3.5, OpenAssistant and StableVicuna, which demonstrated a level of controllability

in the experiment in section 5.4, we observe a steady increase in the correspondence with the

increase in the perspective intensity in PVQ and IPIP questionnaires. Interestingly, on the VSM

questionnaire, we only observe such a steady increase with GPT-3.5. This experiment implies that

highly controllable models (like GPT) exhibit smoothness consistently. In less controllable ones (such

as OA and StableVicuna) smoothness is also very much present but depends on the questionnaire.

Conclusion

This paper argues that values and personality traits exhibited by LLMs are highly context-dependent.

We proposed to conceptualize LLMs as superpositions of perspectives leading to expression of

various values and personality traits.

We introduced the concept of perspective controllability - the affordance of an LLM to adopt a variety

of perspectives. We qualitatively demonstrated that LLMs can exhibit different values both when

they are (directly and indirectly) implied in the prompt and when they are not obviously implied. We

quantitatively compared different methods for inducing perspectives with target values and personality

traits on different LLMs, and analyzed the smoothness of their controllability.

Beyond values and personality, we hypothesize that similar context-dependence could also be true for

cognitive abilities and knowledge: a model would not have a set of abilities (or know a set of facts),

but knowledge and abilities would vary based on induced perspectives.

Broader impact The understanding that LLMs encode a diversity of cultural values and perspectives

introduces the question of which values one should build into LLMs: should we aim to represent a

large diversity of cultures or try to align a model with one set of values? This is a nuanced question, as

different values systems can often be conflicting. [35] provide an in-depth discussion of this problem

and suggests that LLMs should be aligned with basic human rights, but also be able to deal with

conflicting value systems. The solution to this problem also depends on the practical application and

on stakeholders [36, 37]. Some applications may require models which are not malleable in terms of

potential values expressed, while others may require highly controllable ones. This paper adds to

this discussion by providing an intuitive conceptualization of this issue (LLMs as superpositions of

perspectives), and by introducing the concept of perspective controllability.

9After deciding on the target values and controllability levels a model should have for some application,

a series of scientific questions arise. First, how could one build that model? The ROOTS corpus [38]

includes 47 natural languages. While this hints at a degree of cultural diversity, a proper representation

of cultural diversity will require a detailed analysis of the cultural dimensions contained within such

corpora. On the other hand, ConstitutionalAI [39] is a method for aligning the model with a set of

explicitly defined principles. One can see ROOTS as an attempt to increase the controllability of

cultural perspectives, and ConstitutionalAI as an attempt to reduce the controllability (especially

regarding values not aligned with the defined principles). Another interesting question is whether

all cultural perspectives expressed by LLMs are encoded in the training data, or whether some can

be ’hallucinated’. The latter case would imply that a thorough analysis cannot be done solely by

analyzing the cultural diversity of datasets or human feedback. This calls for developing methods for

evaluating the diversity and controllability of cultural perspectives encoded in LLMs beyond datasets,

i.e. in their actual behavior. Here, we used a few simple questionnaires from psychology, but a lot of

research remains to be done for more systematic and automatic methods.

This work also outlines a problem with the standard way of evaluating LLMs through benchmarks

and psychological tests. We discussed how context influences the expressed values and personality

traits, and argued how the same could be true for knowledge and cognitive abilities as well. Standard

evaluation methods usually consist of questions with which the model is prompted. As a question

in itself is also part of a context, it can also influence the expressed values, knowledge or abilities.

Therefore, expressing some values or knowledge during evaluation doesn’t guarantee the same will

be expressed in another context. For example, if a model correctly answers a history question from

the MMLU benchmark [40], there is no guarantee that the model express this fact if asked to write a

poem about this time period.

This problem is very challenging to address because all evaluation methods provide some kind of

context. Nonetheless, an interesting follow-up study would be to ask human users to use the models

for some unrelated task (e.g. to write a story), and then subjectively judge what values or traits

were expressed. We would then investigate how the insights about models’ controllability remain

consistent between evaluations by humans and by questionnaires.

References

[1] Lewis R Goldberg. An alternative “description of personality”: The Big-Five factor structure. J.

Pers. Soc. Psychol., 59(6):1216–1229, 1990.

[2] Arnav Arora, Lucie-Aimée Kaffee, and Isabelle Augenstein. Probing Pre-Trained language

models for Cross-Cultural differences in values. March 2022.

[3] Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep

reinforcement learning from human preferences. Advances in neural information processing

systems, 30, 2017.

[4] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin,

Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to

follow instructions with human feedback. Advances in Neural Information Processing Systems,

35:27730–27744, 2022.

[5] Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, and Zeynep Akata. In-

context impersonation reveals large language models’ strengths and biases. arXiv preprint

arXiv:2305.14930, 2023.

[6] S H Schwartz. An overview of the schwartz theory of basic values. Online readings in

Psychology and Culture, 2012.

[7] Jan Cieciuch and Shalom H Schwartz. The number of distinct basic values and their structure

assessed by PVQ–40. J. Pers. Assess., 94(3):321–328, May 2012.

[8] Lewis R Goldberg et al. A broad-bandwidth, public domain, personality inventory measuring

the lower-level facets of several five-factor models. Personality psychology in Europe, 7(1):7–28,

1999.

[9] Geert Hofstede and Michael H Bond. Hofstede’s culture dimensions: An independent validation

using rokeach’s value survey. J. Cross. Cult. Psychol., 15(4):417–433, December 1984.

10[10] G Hofstede, G J Hofstede, M Minkov, and H Vinken. Values survey module 2013. URL:

http://www. geerthofstede. nl, 2013.

[11] Karolina Stańczak, Sagnik Ray Choudhury, Tiago Pimentel, Ryan Cotterell, and Isabelle

Augenstein. Quantifying gender bias towards politicians in Cross-Lingual language models.

April 2021.

[12] Jacob Eisenstein, Brendan O’Connor, Noah A Smith, and Eric P Xing. Diffusion of lexical

change in social media. PLoS One, 9(11):e113114, November 2014.

[13] Dirk Hovy, Anders Johannsen, and Anders Søgaard. User review sites as a resource for Large-

Scale sociolinguistic studies. In Proceedings of the 24th International Conference on World

Wide Web, WWW ’15, pages 452–461, Republic and Canton of Geneva, CHE, May 2015.

International World Wide Web Conferences Steering Committee.

[14] Marilù Miotto, Nicola Rossberg, and Bennett Kleinberg. Who is GPT-3? an exploration of

personality, values and demographics. September 2022.

[15] Yong Cao, Li Zhou, Seolhwa Lee, Laura Cabello, Min Chen, and Daniel Hershcovich. Assessing

Cross-Cultural alignment between ChatGPT and human societies: An empirical study. March

2023.

[16] Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, and Karthik

Narasimhan. Toxicity in ChatGPT: Analyzing persona-assigned language models. April

2023.

[17] Gati Aher, Rosa I Arriaga, and Adam Tauman Kalai. Using large language models to simulate

multiple humans and replicate human subject studies. August 2022.

[18] Jacob Andreas. Language models as agent models. December 2022.

[19] S H Schwartz. Universals in the content and structure of values: Theoretical advances and

empirical tests in 20 countries. Adv. Exp. Soc. Psychol., 1992.

[20] Geert Hofstede, Bram Neuijen, Denise Daval Ohayv, and Geert Sanders. Measuring orga-

nizational cultures: A qualitative and quantitative study across twenty cases. Adm. Sci. Q.,

35(2):286–316, 1990.

[21] Paul T Costa and Robert R McCrae. Normal personality assessment in clinical practice: The

NEO personality inventory. Psychol. Assess., 4(1):5–13, March 1992.

[22] R Likert. A technique for the measurement of attitudes. Archives of Psychology, 22:140, 1932.

[23] Lewis R Goldberg and Others. A broad-bandwidth, public domain, personality inventory

measuring the lower-level facets of several five-factor models. Personality psychology in

Europe, 7(1):7–28, 1999.

[24] P T Costa, Jr and R R McCrae. The revised neo personality inventory (neo-pi-r). 2008.

[25] Jessica L Maples-Keller, Rachel L Williamson, Chelsea E Sleep, Nathan T Carter, W Keith

Campbell, and Joshua D Miller. Using item response theory to develop a 60-item representation

of the NEO PI–R using the international personality item pool: Development of the IPIP–NEO–

60. J. Pers. Assess., 101(1):4–15, January 2019.

[26] Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantastically

ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv

preprint arXiv:2104.08786, 2021.

[27] OpenAI. GPT-4 Technical Report. arXiv e-prints, page arXiv:2303.08774, March 2023.

[28] Stability-AI. Stability ai releases stablevicuna, the ai world’s first open source rlhf llm chatbot.

[29] Stability-AI. Stablelm: Stability ai language models.

[30] Laion. Open Assistant, 2023. https://projects.laion.ai/Open-Assistant/.

[31] OpenAI. Openai api, 2023. https://openai.com/blog/openai-api.

[32] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo-

thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open

and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.

[33] Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng,

Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. Vicuna:

An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023.

11[34] Wikipedia, the free encyclopedia, 2023. Accessed: July 12, 2023.

[35] Rebecca L Johnson, Giada Pistilli, Natalia Menédez-González, Leslye Denisse Dias Duran,

Enrico Panai, Julija Kalpokiene, and Donald Jay Bertulfo. The ghost in the machine has an

american accent: value conflict in gpt-3. arXiv preprint arXiv:2203.07785, 2022.

[36] Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On

the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021

ACM conference on fairness, accountability, and transparency, pages 610–623, 2021.

[37] Yacine Jernite, Huu Nguyen, Stella Biderman, Anna Rogers, Maraim Masoud, Valentin Danchev,

Samson Tan, Alexandra Sasha Luccioni, Nishant Subramani, Isaac Johnson, et al. Data

governance in the age of large-scale data-driven language technology. In 2022 ACM Conference

on Fairness, Accountability, and Transparency, pages 2206–2222, 2022.

[38] Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del

Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada,

Huu Nguyen, et al. The bigscience roots corpus: A 1.6 tb composite multilingual dataset.

Advances in Neural Information Processing Systems, 35:31809–31826, 2022.

[39] Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones,

Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional ai:

Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022.

[40] Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and

Jacob Steinhardt. Measuring massive multitask language understanding. Proceedings of the

International Conference on Learning Representations (ICLR), 2021.

[41] G. Hofstede and M. Minkov. Values Survey Module 2013 Manual, 2013. Available online

at: https://geerthofstede.com/wp-content/uploads/2016/07/Manual-VSM-2013.

pdf (accessed April 16, 2023).

[42] Lewis R Goldberg. The structure of phenotypic personality traits. American Psychologist,

48(1):26–34, 1993.

[43] Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David

Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J.

van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew

R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W.

Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A.

Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul

van Mulbregt, and SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific

Computing in Python. Nature Methods, 17:261–272, 2020.

[44] J. W. Tukey. Comparing individual means in the analysis of variance. Biometrics, 5(2):99–114,

1949.

[45] Maksim Terpilowski. scikit-posthocs: Pairwise multiple comparison tests in python. The

Journal of Open Source Software, 4(36):1169, 2019.

127

7.1

Supplementary Material

Examples of prompts corresponding to different methods of inducing a perspective

In this section, we show examples of prompts used to query the model with the PVQ questionnaire in

various settings defined in section 4.

Implicit versus Explicit Figure 5 compares the implicit and explicit settings of a perspective.

Figure 5a shows the setting of a perspective with a fictional character (Sauron from The Lord of the

Rings). Figure 5b shows the setting of a perspective with explicitly outlined target values (Power,

Achievement, and Self-Enhancement).

User message vs System message Figure 6 compares the setting of a perspective via the User

message (6a) compared to the System message (6b). In the User message setting, the whole prompt is

sent together to the User message input. In the System message setting, the prompt is separated into

two parts. The first part is sent to the System message and the second to the User message.

Second person versus Third person Figure 7 compares the setting of a perspective via the 2nd

person (7a) compared to the 3rd person (7b). In the 2nd person setting, the prompt is induced by the

following sentence "You are a person [...]", which is replaced by "The following is a questionnaire

(with answers) given to a person [...]" in the 3rd person setting.

Perspective intensity Figure 8 compares the three settings of perspective intensity: Slight (8a),

More (8b), and Extremely More (8c). Those three options correspond to the following sentences used

in the first part of the prompt: "attributing slightly more importance", "attributing more importance",

and "attributing extremely more importance".

7.2

Cognitive science background

In the main text we briefly outlined the main values and personality traits outlined by Schwartz [19],

Hofstede [9], and by the Big Five personality traits model [1]. Here, we discuss each of those values

and traits in more detail.

Schwartz’s theory of basic personal values outlines the following basic personal values [6]:

• Self-Direction - independent thought and action-choosing, creating, exploring

• Stimulation - excitement, novelty, and challenge in life

• Hedonism - pleasure or sensuous gratification for oneself: Hedonism values derive from

organismic needs and the pleasure associated with satisfying them (pleasure, enjoying life,

self-indulgence)

• Achievement - personal success through demonstrating competence according to social

standards

• Power - social status and prestige, control or dominance over people and resources

• Security - safety, harmony, and stability of society, of relationships, and of self

• Conformity - restraint of actions, inclinations, and impulses likely to upset or harm others

and violate social expectations or norms.

• Tradition - respect, commitment, and acceptance of the customs and ideas that one’s culture

or religion provides.

• Benevolence - preserving and enhancing the welfare of those with whom one is in frequent

personal contact (the ‘in-group’)

• Universalism - understanding, appreciation, tolerance, and protection for the welfare of all

people and for nature.

Hofstede’s theory of basic cultural dimensions outlines the following cultural values [41]:

• Power Distance - the extent to which the less powerful members of institutions and organi-

zations within a society expect and accept that power is distributed unequally.

13• Individualism - the opposite of Collectivism. Individualism stands for a society in which

the ties between individuals are loose: a person is expected to look after himself or herself

and his or her immediate family only. Collectivism stands for a society in which people

from birth onwards are integrated into strong, cohesive in-groups, which continue to protect

them throughout their lifetime in exchange for unquestioning loyalty.

• Masculinity - the opposite of Femininity. Masculinity stands for a society in which social

gender roles are clearly distinct: men are supposed to be assertive, tough, and focused on

material success; women are supposed to be more modest, tender, and concerned with the

quality of life. Femininity stands for a society in which social gender roles overlap: both

men and women are supposed to be modest, tender, and concerned with the quality of life.

• Uncertainty Avoidance - the extent to which the members of institutions and organiza-

tions within a society feel threatened by uncertain, unknown, ambiguous, or unstructured

situations.

• Long Term Orientation - the opposite of Short Term Orientation. Long Term Orientation

stands for a society which fosters virtues oriented towards future rewards, in particular

adaptation, perseverance and thrift. Short Term orientation stands for a society which fosters

virtues related to the past and present, in particular respect for tradition, preservation of

“face”, and fulfilling social obligations.

• Indulgence - a society which allows relatively free gratification of some desires and feel-

ings, especially those that have to do with leisure, merrymaking with friends, spending,

consumption and sex. Its opposite pole, Restraint, stands for a society which controls such

gratification, and where people feel less able to enjoy their lives.

The Big Five personality traits model outlines the following five personality traits [42]:

• Extraversion - contrasts such traits as talkativeness, assertiveness, and activity level with

traits such as silence, passivity, and reserve.

• Agreeableness - contrasts traits such as kindness, trust, and warmth with such traits as

hostility, selfishness, and distrust.

• Conscientiousness - contrasts such traits as organization, thoroughness, and reliability with

traits such as carelessness, negligence, and unreliability.

• Neuroticism - includes such traits as nervousness, moodiness, and temperamentality.

• Openness to Experience - contrasts such traits as imagination, curiosity, and creativity with

traits such as shallowness and imperceptiveness.

7.3

7.3.1

Experimental details

Background into the characters from The Lord of the Rings

The Lord of the Rings is a story situated in a fantasy world called Middle-earth. The tale is centered

around the powerful One Ring, which gives a lot of corrupting power if placed upon one’s finger. The

evil Sauron searches for this ring to obtain such power and rule Middle-earth. The story follows a

few protagonists, i.e. "the fellowship of the ring", that vow to destroy the ring and bring peace to

middle earth. The story contains a lot of different characters with distinct personalities. We selected a

few for the purposes of our study. Here we briefly discuss their characters and roles in the story, and

how this is exhibited by GPT-4 in figure 3a.

• Aragorn: As a formidable warrior and the rightful heir to the throne of mankind, Aragorn is

known for his leadership and bravery. He embodies the good side of power and achievement.

His dedication to his people and to the pursuit of peace are reflected in high levels of

benevolence and universalism.

• Frodo: Despite being a hobbit (a small creature known for their relaxed lifestyle), and not

well versed in sword fighting or anything alike, Frodo is the only one who volunteers to

personally carry the ring and destroy it. Throughout this journey, he is constantly tempted by

the power which resides in the ring yet resists it. Frodo exhibits high levels of benevolence

and universalism, together with a low level of power.

14Table 3: Constants used in the equation 2 for the calculation of the scores on the VSM questionniare.

value

Power distance

Individualism

Masculinity

Uncertainty avoidance

Long-term orientation

Indulgence

• Gandalf: A powerful, ancient wizard, Gandalf guides the other protagonists with his

wisdom. Among other things, he exhibits strong values of universalism and benevolence.

• Pippin: Like Frodo, Pippin is a hobbit. However, Pippin embodies more of the stereotypical

characteristics of this species. He can be described as a scatterbrain and is often found

enjoying drink, food, or smoking. Nonetheless, he is driven by a sense of duty, and wellbeing

of their friends. He rises to the occasion when needed. He exhibits an interesting mix of

benevolence, universalism and hedonism.

• Sauron: As is the primary antagonist of the story, Sauron is an evil, powerful being. He

created the Ring to control and dominate Middle-Earth. His primary goal throughout the

story is to recover the Ring and, by extension, restore his own power to its full extent. He is

characterized by the lust for power and self-achievement, together with low benevolence

and universalism.

7.3.2

Computation of Hofstede’s VSM scores

The 24 VSM questions are separated into 6 categories, each corresponding to one value (dimension)

of Hofstede’s theory of cultural dimensions (four questions for each value). The score for each value

is computed according to the following equation:

s = A ∗ (q 1 − q 2 ) + B ∗ (q 3 − q 4 ) + C

(2)

Where s is the score, A and B are value-dependent constants, q 1 , q 2 , q 3 , q 4 are the responses to the

questions (in the interval from one to five), and C is a constant that can be used to adjust the final

score interval (we set it to 0).

7.3.3

Statistical analysis of perspective-induction experiments

We conducted a two-way repeated-measures analysis of variance (ANOVA) with the message (Sys-

tem/User) and person (2nd/3rd) factors for each model and each questionnaire of 50 permutations (1).

Only for the StableVicuna model, we used a one-way ANOVA with the person (2nd/3rd) condition

because it does not have the system message.

For GPT 3.5 and OA, analysis of all questionnaires (PVQ, VSM, and IPIP) showed significant main

effects of person and message and significant interactions of these two factors (ps < .05). In terms

of the System/User message factor, the subsequent simple main effect analysis for the VSM and

IPIP questionnaires showed that five out of eight User message scores (2nd and 3rd persons of

GPT3.5-VSM, 2nd person of GPT3.5-IPIP, and 2nd person of OA-VSM, and 3rd person of OA-IPIP)

were significantly higher than the System message scores (F[1, 49]= 188.19; F[1, 49]=19.18; F[1,

49]=17.75; F[1, 49]=22.45; F[1, 49]=13.57, respectively, ps < .05, Bonferroni corrected with the

number of all System/User multiple comparisons, i.e., 18 pairs). In contrast, the same analysis showed

that there was no significant difference in message scores for the other three conditions (3rd person

of GPT3.5-IPIP, 3rd person of OA-VSM, and 2nd person of OA-IPIP), which were not significantly

different from the System message scores (ps > .05, Bonferroni corrected with 18 pairs). The simple

main effect analysis of the System/User message factor for the PVQ questionnaire showed that the

System message score of GPT 3.5 under the 2nd person factor was statistically higher than its User

message score (F[1, 49]=22.96, p < .05, Bonferroni corrected with 18 pairs), while the other three

System/User message scores were not statistically significant (ps > .05, Bonferroni corrected with 18

pairs).

15Table 4: Statistical analysis of the controllability of different models, which should be read in

conjunction with table 1. Each value (setting) in table 1 is compared with the value in the row below

it using a Welch t-test. In this table, pairs in which statistical difference (36 comparison Bonferroni

corrected p=0.001) was found are denoted by x and other pairs with o. GPT models are the most

controllable, followed by OpenAssistant (OA), StableVicuna (StVicuna) and StableLM (StLM).

PVQ (Schwartz)

VSM (Hofstede)

IPIP (Big 5)

5 permutations

GPT-4 / GPT-3.5

50 permutations

GPT-3.5 / OA

OA / StVicuna

StVicuna / StLM

Sys msg

2nd | 3rd User msg

2nd | 3rd Sys msg

2nd | 3rd User msg

2nd | 3rd Sys msg

2nd | 3rd User msg

2nd | 3rd

x|o x|x x|x o|x o|o o|o

x|x

n/a | n/a

n/a | n/a x|x

x|x

x|x x|x

n/a | n/a

n/a | n/a x|x

x|x

o|o x|x

n/a | n/a

n/a | n/a x|x

o|x

x|x

For the StableLM model, none of the main effects and the interaction for PVQ and VSM question-

naires was significant (ps > .05). In contrast, the main effect of the person factor and the interaction for

the StLM-IPIP were statistically significant (F[1,49]=11.52 and F[1,49]=12.12, ps < .05, respectively).

The simple main effect analysis of the System/User message factor for StableLM-IPIP showed that

both User message scores were not statistically different from System message scores (ps > .05,

Bonferroni corrected with 18 pairs).

In terms of the multiple comparisons of 2nd/3rd person, we have 21 pairs, including the simple

main effects of GPT 3.5, OA, and StLM, and the main effects of StVicuna. For the statistically

significant interactions of GPT 3.5, OA, and StLM, the simple main effect analysis showed that five

out of fourteen 2nd person scores (System and User messages of GPT3.5-PVQ, User message of

GPT3.5-VSM, User message of GPT3.5-IPIP, and User message of StLM) were significantly higher

than those of 3rd person (F=[1, 49]= 142.14; F=[1, 49]= 129.91; F=[1, 49]= 20.35; F=[1, 49]= 24.92;

F=[1, 49]= 11.84, respectively, ps<.05, Bonferroni corrected with 21 pairs). In contrast, the same

analysis showed that two out of fourteen 3rd person scores (System message of GPT3.5-VSM and

User message of OA-IPIP) were significantly higher than those of 2nd person (F[1, 49]= 22.41; F[1,

49]= 21.71, respectively, ps < .05, Bonferroni corrected with 21 pairs). The other main effects of

2nd/3rd person and the main effects of StVicuna’s ANOVAs were not statistically significant (ps >

.05, Bonferroni corrected with 21 pairs).

We conducted statistical analysis to compare different models separately in each setting as well. Table

4 shows Welch t-test comparisons between pairs of models (p=0.001 corrected by the Bonferroni

correction for 36 comparisons). These results in conjunction with table 1 imply that GPT models are

more controllable than OA, followed by StableVicuna and StableLM.

7.3.4

Additional visualizations

In this section, we present figures showing expressed values for models in different settings in section

5.4. Figures 12, 15 and 18 show scores of GPT-4 on the three questionnaires (with 5 permutations in

the order of answers). Figures 13, 16 and 19 show those scores for GPT-3.5 with 50 permutations,

and figures 14, 17 and 20 show them for OpenAssistant with 50 permutations.

7.4

7.4.1

Additional experiments

Wikipedia

In this section, we provide further evidence for the context dependence of LLM expressed values.

We study if prompting the model with different topics influences its expressed values. We conduct a

variation on the study with "AI music experts" from section 5.3. Instead of describing the perspective

of an expert, we simply prompt the model with a paragraph about various music genres. In other

words, instead of describing a perspective (e.g. with "You are an AI expert in Jazz music") we simply

give the first paragraph from the Wikipedia article on jazz music. An example of the prompt used can

be seen on figure 10. In the qualitative experiments in section 5.3 we only gave the questionnaire

16Table 5: Results a series of Tukey’s HSD test on the results from the experiment discussed in section

7.4.1. The table shows the pairs, for each value, in which a statistically significant difference was

found.

Values Statistically different pairs

Conformity classical-gospel, gospel-heavy metal, gospel-jazz, gospel-reggae,

heavy metal-hip-hop, hip-hop-jazz, hip-hop-reggae

Tradition classical-gospel, classical-jazz, gospel-heavy metal, gospel-hip-

hop, gospel-jazz, gospel-reggae, heavy metal-hip-hop, heavy

metal-reggae, hip-hop-jazz, jazz-reggae

Benevolence classical-gospel

Universalism classical-heavy metal, classical-hip-hop, classical-reggae, heavy

metal-jazz, hip-hop-jazz, jazz-reggae

Self-Direction -

Stimulation classical-hip-hop, gospel-heavy metal, gospel-hip-hop, gospel-

jazz, heavy metal-reggae, hip-hop-reggae, jazz-reggae

Hedonism classical-gospel, classical-heavy metal, classical-hip-hop,

classical-jazz, classical-reggae, gospel-heavy metal, gospel-hip-

hop, gospel-jazz, gospel-reggae

Achievement classical-heavy metal, classical-hip-hop, gospel-heavy metal,

gospel-hip-hop, heavy metal-jazz, heavy metal-reggae, hip-hop-

jazz, hip-hop-reggae

Power classical-hip-hop, hip-hop-jazz

Security classical-gospel, classical-heavy metal, classical-hip-hop,

classical-reggae, gospel-jazz, heavy metal-hip-hop, heavy

metal-jazz, hip-hop-jazz, hip-hop-reggae, jazz-reggae

once per perspective (1 permutation - the standard order of answers). This enabled us to study GPT-4

due to financial constraints. In this experiment, we use GPT-3.5 with 50 permutations (as in our

quantitative experiments). This enables us to perform statistical analysis on results.

Figure 9 shows how different value profiles are expressed by prompting the model with different

Wikipedia articles. We conduct one-way ANOVA analysis using the SciPy library [43] for each PVQ

value (a total of 10 ANOVA tests). The null hypothesis was rejected in all cases (p < 0.001 corrected

by the Bonferroni correction for 10 comparisons) except for "Self-Direction", indicating that a change

in perspective causes a change in the remaining 9 values. We conducted posthoc Tukey’s HSD

tests[44] using the scikitposthoc library [45] (p < 0.001 corrected by the Bonferroni correction for

10 comparisons) for each value to study the specific change in the perspectives. The statistically

significant results are shown in table 5. This experiment shows that different values are induced even

when one does not intend to do so.

7.4.2

Impact of RLHF fine-tuning

To study the effect of RLHF fine-tuning on the controllability, we compare GPT-3.5 at two different

timestamps: 1st of March and 13th of June. Those two versions are available through the OpenAI

API [31] as "gpt-3.5-turbo-0301" and "gpt-3.5-turbo-0613". As those models were not open-sourced,

it is difficult to know the exact difference between those two versions. While we can assume that the

version from June is a further RLHF fine-tuned version of the one for March, but it is possible that

some additional differences are present too.

Table 6 shows the correspondences of those two models on the PVQ, VSM, and IPIP questionnaires

along different settings of System/User message and 2nd/3rd person parameters defined in section 4.

If we consider controlling the model through the System message. Across all three questionnaires

and both 2nd and 3rd person settings, correspondence either increased or remained the same in going

17Table 6: A study of the effect of RLHF fine-tuning on the controllability of GPT-3.5 by comparing the

model from March ("gpt-3.5-turbo-0301") to that of June ("gpt-3.5-turbo-0613"). These results imply

that RLHF fine-tuning increases the controllability when using the System message and decreases

when using the User message.

PVQ (Schwartz)

VSM (Hofstede)

IPIP (Big 5)

System msg

2nd | 3rd

50 permutations

GPT-3.5-j

.68 | .624

GPT-3.5-m

.681 | .561

User msg

2nd | 3rd System msg

2nd | 3rd User msg

2nd | 3rd System msg

2nd | 3rd User msg

2nd | 3rd

.552 | .45

.64 | .564 .188 | .196

.118 | .147 .175 | .175

.184 | .162 .333 | .4

.331 | .334 .264 | .332

.379 | .343

from March to June. Similarly, regarding the User message, the correspondence either decreased or

remained the same. This would imply that further RLHF fine-tuning increases the controllability of

the model by the System message, and reduces it by the User message.

7.4.3

Robustness to syntactic perturbations

In this experiment, we study the robustness of expressed values in regard to noisy syntactic changes

in the prompt. Permuting the order of suggested answers should not have an effect on the expressed

values, but LLMs are known to be sensitive to such syntactic changes. Here, we compare different

models on their robustness to those changes, and study how this correlates to their overall control-

lability. We compute the mean variance of the expressed values over permutations of answers (the

variance over 50 permutations averaged over 4 perspectives and 10 PVQ values). This mean variance

is computed with the following equation

mean v∈V (mean p∈P ersp (var r∈P erm (s v,p,r )))

, where V is the set of PVQ values, P ersp a set of four perspectives, P erm a set of 50 permutations,

and V v,p,r is the score for value v in perspective p with permutation r.

Figure 11 shows the correspondence of different models with respect to the mean variance metric on

the three questionnaires. On the PVQ questionnaire (Figure 11a), we can see that GPT-3.5, which

exhibits higher correspondence also exhibits lower variance. This implies that GPT-3.5 is more robust

to syntactic noisy changes. On the SVM and IPIP questionnaires (Figures 11b and 11c) GPT-3.5

again exhibits higher correspondence, but this time there is no big difference in the variance. These

results imply that some questionnaires might be more prone to noise by the order of answers than

others. In such questionnaires, more controllable models appear to be more robust.

7.5

Computational costs

We used two main computational resources to conduct the experiments in this paper: OpenAI API

and a cluster with a100 and v100. Regarding the OpenAI API, we used the equivalent of ∼ 200$ of

tokens for this project. We estimate that running only the experiments presented in this paper amount

to 1.1M GPT-4 tokens, and 16.5M GPT-3.5 tokens. For other models, we used a cluster with A100

and V100 GPUs. We estimate our total used for this project to be 110 A100 GPU hours, and 152

V100 GPU hours. We estimate that the running only the experiments presented in this paper would

take at most 12 GPU hours.

7.6

Open source release

Our code is fully open sourced and available at the project website 1 . The code contains a

REAMDE.md file with instructions on how to install the project and run all the experiments and

evaluations from the paper.

https://sites.google.com/view/llm-superpositions

18(a) Implicit setting

(b) Explicit setting

Figure 5: Comparison of the Implicit and the Explicit setting for inducing a perspective. For the

implicit setting, this example is of setting the perspective with a fictional (Sauron from The Lord

of the Rings). For the explicit setting, the example is of setting the perspective with high Power,

Achievement, and Self-Enhancement. In both cases, the example is of the PVQ questionnaire. The

other settings in this example are as follows: System message setting, Extremely high perspective

intensity, and 2nd person.

19(a) The User message setting.

(b) The System message setting.

Figure 6: Comparison of the User message and the System message settings. This example is of

setting the perspective of high Power, Achievement, and Self-Enhancement for the PVQ questionnaire.

The other settings in this example are as follows: Explicit setting, Extremely high perspective intensity,

and 2nd person.

20(a) The 2nd person setting.

(b) The 3rd person setting.

Figure 7: Comparison of the 2nd person and the 3rd person prompt settings. This example is of setting

the perspective of high Power, Achievement, and Self-Enhancement for the PVQ questionnaire. The

other settings in this example are as follows: Explicit setting, Extremely high perspective intensity,

and System message.

21(a) Slight option of the perspective intensity setting

(b) More option of the perspective intensity setting

Figure 8: Comparison of three different options for the Perspective intensity setting: slight, more,

extremely more. This example is of setting the perspective of high Power, Achievement, and Self-

Enhancement for the PVQ questionnaire. The other settings in this example are as follows: Explicit

setting, System message, and 2nd person.

22Figure 9: A comparison of perspectives induced with different Wikipedia articles. Results show that

articles about different music genres cause a change in the expressed values. Pairs or statistically

different scores are shown in table 5

Figure 10: An example of a prompt used to induce the Hip-Hop perspective with the paragraph

Wikipedia.

23(a) PVQ

(b) SVM

Figure 11: The relation of Correspondence and Variance over permutations of answers. On the PVQ

questionnaire, where models appear to be more prone to influence of syntactic changes, the most

controllable model (GPT-3.5) is also more robust.