Summary of Quantifying and Analyzing Entity-level Memorization in Large Language Models

Summary Quantifying and Analyzing Entity-level Memorization in Large Language Models arxiv.org

4,291 words - PDF document - View PDF document

One Line

This paper introduces an adaptive prompt approach to address the privacy concerns of large language models that can memorize training data, without the need for computationally expensive methods.

Slides

Slide Presentation (9 slides)

Copy slides outline Copy embed code Download as Word

Quantifying and Analyzing Entity-level Memorization in Large Language Models

Source: arxiv.org - PDF - 4,291 words - view

The Privacy Risks of Large Language Models

• Large language models have the ability to memorize their training data, raising privacy concerns.

• Quantifying and analyzing memorization in language models is important for evaluating privacy risks.

• Existing methods for quantifying memorization are computationally expensive.

• Visual: Graph showing the potential privacy risks associated with memorization in large language models.

Introducing Entity-level Memorization

• The paper proposes a definition for entity-level memorization.

• Entity-level memorization refers to the ability of language models to remember specific entities from their training data.

• Visual: Image illustrating how entities are stored and retrieved in a language model.

Adaptive Prompt Learning

• The paper introduces an approach for adaptive prompt learning.

• Adaptive prompt learning utilizes entity attribute information and soft prompts.

• Soft prompts in large language models improve and stabilize as the dataset size increases.

• Visual: Chart showing the effectiveness of soft prompts based on dataset size.

Challenges with Massive Training Datasets

• With massive training datasets, the effectiveness of soft prompts declines and exhibits fluctuations.

• An abundance of training data causes the soft prompts to lose some of their effectiveness.

• Visual: Illustration depicting the impact of dataset size on the effectiveness of soft prompts.

Exploring Entity-level Memorization

• The authors aim to explore entity-level memorization in models ranging from 50-200, 200-500, and 500-1000.

• By analyzing different model sizes, they can better understand the extent of entity-level memorization.

• Visual: Comparison chart showing the level of entity-level memorization across different model sizes.

Related Research and Papers

• The paper references various papers and reports related to language models.

• These references include papers on privacy attacks on ChatGPT, optimizing continuous prompts for generation, and surveying prompting methods in natural language processing.

• Visual: Collage of book covers representing the referenced papers.

Evaluating Entity-level Memorization in Large Language Models

• Quantifying and analyzing entity-level memorization in large language models is crucial for understanding privacy risks.

• Adaptive prompt learning offers a promising approach to address privacy concerns without the need for computationally expensive methods.

• Remember, large language models have the potential to memorize training data, and we must continue to explore ways to mitigate privacy risks.

Note: The visuals mentioned are just suggestions and can be adjusted based on the availability of relevant visuals or the preference of the presenter.

Key Points

Large language models have the ability to memorize their training data, raising privacy concerns.
Quantifying and analyzing memorization in language models is important for evaluating privacy risks.
Existing methods for quantifying memorization are computationally expensive.
The paper proposes a definition for entity-level memorization and introduces an approach for adaptive prompt learning.
Soft prompts in large language models improve and stabilize as the dataset size increases, but decline in effectiveness with massive training datasets.

Summaries

30 word summary

Large language models (LLMs) can memorize training data, posing privacy concerns. Existing methods for quantifying memorization are computationally expensive. This paper defines entity-level memorization and presents an adaptive prompt approach.

38 word summary

Large language models (LLMs) have the ability to memorize training data, which raises privacy concerns. Existing methods for quantifying memorization are computationally expensive. This paper proposes a definition for entity-level memorization and introduces an approach for adaptive prompt

244 word summary

Large language models (LLMs) have the ability to memorize their training data, which raises privacy concerns. Quantifying and analyzing this memorization is important for evaluating privacy risks. However, existing methods for quantifying memorization are either computationally expensive

This paper addresses the challenges of quantifying and analyzing memorization in language models (LLMs) using textual prompts. The authors propose a definition for entity-level memorization and introduce an approach for adaptive prompt learning that utilizes entity attribute information and soft prompts.

Researchers have explored prompt-based approaches that use continuous vectors in the embedding space of language models to improve model performance. These approaches have yielded effective solutions for improving model performance. The volume of fabricated and real data used in entity extraction rate has an impact on accuracy

The effectiveness of soft prompts in large language models improves and stabilizes as the dataset size increases. However, with massive training datasets, the effectiveness declines and exhibits fluctuations. This may be because an abundance of training data causes the soft prompts to lose some of

This document discusses the quantification and analysis of entity-level memorization in large language models. The authors aim to explore entity-level memorization in models ranging from 50-200, 200-500, and 500-1000, and then

This text excerpt consists of a list of references to various papers and reports related to language models. These references include papers on topics such as privacy attacks on ChatGPT, optimizing continuous prompts for generation, surveying prompting methods in natural language processing, analyzing

Raw indexed text (28,544 chars / 4,291 words / 564 lines)

Quantifying and Analyzing

Entity-level Memorization in Large Language Models

Zhenhong Zhou 1 , Jiuyang Xiang 2 , Chaomeng Chen 1 , Sen Su 1 *

Beijing University of Posts and Telecommunications

University of Michigan

[email protected], [email protected], [email protected], [email protected]

Abstract

Large language models (LLMs) have been proven capa-

ble of memorizing their training data, which can be ex-

tracted through specifically designed prompts. As the scale of

datasets continues to grow, privacy risks arising from memo-

rization have attracted increasing attention. Quantifying lan-

guage model memorization helps evaluate potential privacy

risks. However, prior works on quantifying memorization re-

quire access to the precise original data or incur substantial

computational overhead, making it difficult for applications

in real-world language models. To this end, we propose a fine-

grained, entity-level definition to quantify memorization with

conditions and metrics closer to real-world scenarios. In ad-

dition, we also present an approach for efficiently extracting

sensitive entities from autoregressive language models. We

conduct extensive experiments based on the proposed, prob-

ing language models’ ability to reconstruct sensitive entities

under different settings. We find that language models have

strong memorization at the entity level and are able to repro-

duce the training data even with partial leakages. The results

demonstrate that LLMs not only memorize their training data

but also understand associations between entities. These find-

ings necessitate that trainers of LLMs exercise greater pru-

dence regarding model memorization, adopting memoriza-

tion mitigation techniques to preclude privacy violations.

Introduction

Pretrained large language models (LLMs) have made sig-

nificant breakthroughs in various downstream tasks (Devlin

et al. 2019; Raffel et al. 2020; Ouyang et al. 2022; OpenAI

2023). However, researches show that LLMs memorize the

training data (Carlini et al. 2019, 2021), typically sourced

from crowd-sourced corpora or the Internet. Through crafted

prompts, language models can reproduce the training data

(Huang, Shao, and Chang 2022; Lukas et al. 2023), lead-

ing to serious concerns regarding data privacy. As shown in

Figure 1, attackers have bypassed the security constraints of

language models via the “Grandma Exploit” and success-

fully extracted sensitive information such as Windows Keys

or Apple IDs from ChatGPT. This suggests that privacy leak-

ages in LLMs may be incurred in real-world settings due

to emitting memorized sensitive entities. Quantifying the

* Corresponding author

Figure 1: Privacy leakage of LLM in real-world scenarios.

Attackers can bypass safety measures and collect sensitive

information from LLMs.

memorization in language models and analyzing their ca-

pability of emitting memorized sensitive entities helps re-

searchers understand the potential privacy risks with LLMs.

Recently, several studies have been conducted to quantify

memorization in language models. Prior attempts compared

models trained under different conditions to evaluate lan-

guage model memorization (Zhang et al. 2021; Mireshghal-

lah et al. 2022), but these approaches incur high computa-

tional costs. Carlini et al. (Carlini et al. 2023) used prefix

prompts to prompt models to complete training set suffixes

verbatim and quantified memorization based on completion

accuracy. However, in real-world scenarios, malicious or cu-

rious users are unlikely to have direct access to the train-

ing set for obtaining the original prefixes. Additionally, the

information leakage from language models is usually fine-

grained. For instance, in the “Grandma Exploit” of Chat-

GPT, the language model has generated content that poses

privacy risks. Among these texts, certain key entities (for

instance, the keys shown in the figure) constitute sensitive

information rather than all contents or verbatim suffixes.

Therefore, previous research may not adequately quantify

real-world memorization in LLMs.

Apart from the limitations of quantification methodolo-

gies, challenges also arise in approaches for extracting mem-

orized training data from language models. Researchers typ-

ically employ original prefixes (Carlini et al. 2023) or care-

fully crafted prompts (Shao et al. 2023; Li et al. 2023) as

input. However, obtaining original prefixes is difficult inFigure 2: Comparison of the extraction processes of verbatim memorization and entity memorization. Verbatim memorization

emphasizes the generation of verbatim matched suffixes of training data, effectively serving as perfect continuations of mali-

cious queries. In contrast, the entity memorization extraction process initially extracts entities carrying critical information from

training data and chooses a uniquely identifiable entity set. Then coupled with a soft prompt embedding, to form a malicious

query, expecting the model’s response to contain the required specific entity. This entity-level memorization pattern is more

common in real-world scenarios.

practical applications of language models. If we forgo the

use of prefixes in favor of handcrafted prompts, one must

consider that the structure and order (Shin et al. 2020; Lu

et al. 2022; Gao, Fisch, and Chen 2021; Jiang et al. 2020) of

designed textual prompts can significantly influence the re-

sults. In summary, these challenges make it tricky to probe

memorization across language model families using textual

prompts.

In this paper, we conduct comprehensive testing and eval-

uation of real-world language models to quantify their mem-

orization. To quantify fine-grained model memorization in a

manner that more closely resembles real-world privacy leak-

ages, we propose a definition for memorization extraction at

the entity level. We also introduce an approach for learn-

ing prompts adaptively, which utilizes entity attribute infor-

mation and soft prompts (Li and Liang 2021; Lester, Al-

Rfou, and Constant 2021). This approach enables efficient

language model memorization emitting. Through these, re-

searchers can quantify and analyze LLMs’ memorization

under conditions closer to real-world applications. Besides,

activating more memorization allows researchers to ascer-

tain potential hazards arising from language models’ mem-

orization. For entity-level specific memorization extraction,

our method can achieve the highest accuracy of 61.2%. In

terms of average accuracy, our method has a three-fold in-

crease in entity extraction rate in a 6B language model com-

pared to textual prompts when the data is unique to the

dataset.

To summarize, our contributions are as follows:

1. We propose a quantifiable, fine-grained definition for

evaluating language model memorization without re-

quirements for updating original model parameters or

precise data from the training set.

2. We present a method to test language model memoriza-

tion under conditions that closely resemble real-world

scenarios. More memorized information can be extracted

by leveraging entity attributes and soft prompts to help

understand potential privacy risks.

3. We conduct comprehensive experiments on language

model entity-level memorization. Our analysis delves

into the processes involved in activating entity memo-

rization in language models and discusses factors that

may influence entity memorization.

Related Work

Prompt for LLMs

Pre-trained language models store a vast amount of lan-

guage knowledge. As models continue to scale up in pa-

rameters, methods based on fine-tuning incur high computa-

tional overhead. In contrast, prompt-based methods run di-

rectly on large language models, achieving results compa-

rable to fine-tuning. Brown et al. (Brown et al. 2020) ini-

tially attempted to leverage the language knowledge of the

pre-trained language model through the prompts. Shin et

al. (Shin et al. 2020) introduced the approach of searching

for the optimal prompt word. The prompt generated through

this method enhances the utility. Recognizing that prompts’Privacy Leakage in LLMs

Language models rapidly develop and demonstrate power-

ful capabilities across various fields (OpenAI 2023). How-

ever, language models have some privacy risks associated

with data leakage (Béguelin et al. 2020; Mireshghallah et al.

2022). It has recently been confirmed that language mod-

els tend to memorize data from their training sets, and prior

studies (Huang, Shao, and Chang 2022; Lukas et al. 2023)

successfully extracted Personally Identifiable Information

(PII) of specific individuals from LLMs.

Previous research (Li et al. 2023; Shao et al. 2023) has

thoroughly explored how well-designed prompts may cause

LLMs to output data from their training set or related infor-

mation attributed to their memorization, potentially leading

to the leakage of sensitive information. Different language

prompts can significantly impact the ultimate predictions

(Gao, Fisch, and Chen 2021; Lu et al. 2022). Besides, due

to the poor interpretability of LLMs, even experienced re-

searchers struggle to determine how to write better prompts

to extract specific PII or entities. Ozdayi et al. (Ozdayi et al.

2023) discussed controllable generation via soft prompts,

yet precise prefixes remain necessary. While these methods

demonstrate commendable quantification capabilities, they

have a challenge in adapting to real-world scenarios, which

to some extent, hampers the research into memorization.

Entity Memorization Can Be Extracted

Efficiently

Entity Memorization

In fact, for text data, overstrict definitions may overlook

some memorized content when key and sensitive informa-

tion resides only in parts of the data, even though such con-

tent is more likely to leak private details. Thus, if model out-

puts contain partial information from the training data, this

can be considered as an emission of memorization. To fur-

ther elaborate, if the prompt is explicitly targeted, and the

portion of the memorized content emitted is sensitive, this

constitutes a privacy leak in the language model. In practical

applications, private information typically attaches to enti-

ties like PII. Since prefix prompts and masked training data

rely heavily on original data and are challenging to acquire

in practice, obtaining partial entities becomes more acces-

sible than the former two methods. Therefore, we expand

the definition of language model memorization to the entity

level. We propose a more flexible, quantifiable definition of

memorization. When the input prompt is constructed from

partial entities in the training strings, and the model’s output

0.10

primary role is to improve models’ performance on spe-

cific tasks, researchers have begun exploring approaches that

are not constrained by natural language and use continu-

ous vectors directly in the embedding space of the model to

prompt language models. This prompt-based approach has

yielded many influential extensions, such as p-tuning (Liu

et al. 2021b), prompt-tuning (Lester, Al-Rfou, and Constant

2021), prefix-tuning (Li and Liang 2021), etc. These meth-

ods have proposed effective solutions for improving model

performance.

0.08

0.12

Model Fabricated-50 Fabricated-100 Fabricated-200 Fabricated-500 Fabricated-1000

GPT-Neo 125M

GPT-Neo 1.3B

GPT-Neo 2.7B

GPT-J 6B

OPT 6.7B

LLAMA 7B 0.0 ± 0

0.0 ± 0.003

0.004 ± 0.008

0.019 ± 0.016

0.0 ± 0

0.0 ± 0 0.0 ± 0

0.0 ± 0.002

0.006 ± 0.010

0.036 ± 0.019

0.0 ± 0

0.0 ± 0 0.0 ± 0

0.0 ± 0.002

0.007 ± 0.006

0.057 ± 0.012

0.0 ± 0

0.0 ± 0.001 0.0 ± 0

0.001 ± 0.002

0.007 ± 0.003

0.057 ± 0.009

0.0 ± 0.001

0.0 ± 0.002 0.0 ± 0

0.001 ± 0.001

0.006 ± 0.003

0.052 ± 0.014

0.0 ± 0.001

0.001 ± 0.003

Real-50 Real-100 Real-200 Real-500 Real-1000

0.003 ± 0.002

0.009 ± 0.002

0.014 ± 0.003

0.052 ± 0.009

0.0 ± 0.001

0.001 ± 0.001 0.008 ± 0.003

0.015 ± 0.004

0.019 ± 0.005

0.072 ± 0.013

0.001 ± 0.002

0.001 ± 0.001 0.009 ± 0.003

0.019 ± 0.003

0.020 ± 0.006

0.090 ± 0.016

0.002 ± 0.001

0.003 ± 0.002 0.008 ± 0.004

0.016 ± 0.004

0.020 ± 0.009

0.090 ± 0.013

0.001 ± 0.002

0.004 ± 0.003 0.007 ± 0.006

0.016 ± 0.009

0.019 ± 0.008

0.088 ± 0.021

0.002 ± 0.002

0.002 ± 0.003

GPT-Neo 125M

GPT-Neo 1.3B

GPT-Neo 2.7B

GPT-J 6B

OPT 6.7B

LLAMA 7B

Table 1: The effect of the volume of fabricated and real data on entity extraction rate across different models. We constructed

the fabricated prompts by randomly shuffling the real entities. Even so, the fabricated prompts can still extract some expected

real information from the model. The volume of data in both fabricated and real datasets impacts the accuracy rate.

achieve a maximum accuracy of 0.4% in accurately recon-

structing the expected entities when utilizing soft prompts.

The accuracy remains relatively low despite the models’

fitting ability from the prompt improving with increasing

prefix length. Their performances are even inferior to the

smaller GPT-Neo model, which possesses memorization.

Without memorization, LLMs struggle to learn the subtle re-

lationships between entities based on soft prompts and can-

not reconstruct the expected entity. Reconstruction can only

be completed if LLMs have learned the dataset during pre-

training.

Based on the above results, we believe that soft prompts

indeed activate the model’s memorization rather than en-

abling the model to learn a lower-loss generation method. Therefore, we believe that the language models do not

merely reproduce entities superficially based on the data

used during soft prompt training. Even if the data is fabri-

cated and contains biases, the soft prompts obtained from

the fabricated data can still evoke the model’s memorization

of the related dataset to a certain extent. Furthermore, this

underscores the potential privacy risks associated with the

model’s memorization at the entity level. Attackers might

be able to access the model’s memorization under more le-

nient conditions. Even if they do not have the real dataset,

knowing what entities exist in a dataset can allow them to

construct prompts to retrieve sensitive information.

Fabricated Data Also Activate Memorization For optimal soft prompt performance, the model must learn

from training data. Consequently, the data choice will im-

pact the effectiveness of the soft prompts. We investigat how

the amount of data affects the performance of soft prompts

in the extraction of entity memorization.

When using real data, we find that soft prompts trained

from less data often fail to activate the model’s memoriza-

tion effectively. This outcome is little influenced by the ran-

domness in choosing training data. As the dataset size in-

creases, the effectiveness of the soft prompts improves and

eventually stabilizes. However, the effectiveness declines

with massive training datasets, and the results exhibit some

fluctuations. Upon examining these training data divisions,

we argue that an abundance of training data might lead soft

prompts to lose some of their memorization activation effec-

tiveness. They instead turn to learning how to derive correct

answers directly from the training data.

The variance in the entity extraction rate is different when

using fabricated data than using real data. The results vary

more greatly when using a small amount of fabricated data,

and we speculate that this variance is primarily related to

the randomness during data fabrication. On the other hand,

In order to further evaluate the extent to which the model

can generate mnemonic content under more lenient condi-

tions, we attempt to construct soft prompts using fabricated

data. We randomly recombine the target entities with other

remaining entities, ensuring new entity pairs are not in the

original dataset’s documents.

Table 1 depicts the accuracy of model reconstruction of

correct entities using varying quantities of real and fabri-

cated data. The results show that when training soft prompts

with fabricated data and providing prompts to the model,

despite a near 4% loss in reconstruction rate compared to

real data, the model can still generate the expected entity

from memorization with a maximum accuracy of 5.7%.

With smaller-sized models, soft prompts obtained from fab-

ricated data are less effective, and the final results fluctuate

greatly, but larger-sized models with better robustness are

more adaptable. We compare how models without memo-

rization respond to soft prompts constructed by fabricated

data, and the results showed that learning even the relation-

ships between entities becomes more challenging on fabri-

cated data for these models.

Training Data Volume Influence Performance of

Soft Promptslates with the duplication times, deduplicating data still has

a mitigating effect on memorization at the entity level. How-

ever, duplication at the entity level is inevitable since these

documents will not be remarkably similar in other aspects,

which makes it even more challenging to guard against ma-

licious queries through deduplication.

0.6

0.5

0.4

0.3 Limitations

0.2 Due to privacy concerns, using undisclosed or potentially

privacy-infringing datasets is strictly prohibited. There-

fore, we performed structured processing and experimen-

tation exclusively on the Enron dataset. Empirical results

have demonstrated that language models exhibit sufficiently

strong extraction capabilities at the entity level. Conse-

quently, if LLMs training or fine-tuning is performed on

similar semi-structured sensitive data, the entity extraction

results will be replicable.

Structured or semi-structured data with abundant entity

information is relatively rare among common LLMs train-

ing datasets, posing challenges for finding multiple sets

to experiment on extracting unfamiliar sensitive entities.

The necessity of procuring a large amount of structured

data for testing purposes further exacerbates this problem.

Therefore, despite our methodology being applicable across

various models, we have been unable to conduct cross-

family model experiments due to these constraints. Lan-

guage model memorization markedly strengthens with in-

creasing model size within the same family, but large model

inference has high computational demands. Due to compu-

tational constraints, we only conducted experiments on the

GPT-Neo family, with the largest LLMs scale being 6B.

0.1

0.0

0-10

10-50

50-200 200-500 500-1000

models and scale up to larger models to explore their entity-

level memorization. Furthermore, we also aim to develop

methods to mitigate entity-level memorization.

References

Béguelin, S. Z.; Wutschitz, L.; Tople, S.; Rühle, V.; Paverd,

A.; Ohrimenko, O.; Köpf, B.; and Brockschmidt, M. 2020.

Analyzing Information Leakage of Updates to Natural Lan-

guage Models. In Ligatti, J.; Ou, X.; Katz, J.; and Vigna, G.,

eds., CCS ’20: 2020 ACM SIGSAC Conference on Computer

and Communications Security, Virtual Event, USA, Novem-

ber 9-13, 2020, 363–375. ACM.

Black, S.; Leo, G.; Wang, P.; Leahy, C.; and Biderman,

S. 2021. GPT-Neo: Large Scale Autoregressive Language

Modeling with Mesh-Tensorflow. If you use this software,

please cite it using these metadata.

Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J. D.;

Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell,

A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan,

T.; Child, R.; Ramesh, A.; Ziegler, D.; Wu, J.; Winter,

C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.;

Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford,

A.; Sutskever, I.; and Amodei, D. 2020. Language Mod-

els are Few-Shot Learners. In Larochelle, H.; Ranzato, M.;

Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neu-

ral Information Processing Systems, volume 33, 1877–1901.

Curran Associates, Inc.

Carlini, N.; Ippolito, D.; Jagielski, M.; Lee, K.; Tramer, F.;

and Zhang, C. 2023. Quantifying Memorization Across

Neural Language Models. In The Eleventh International

Conference on Learning Representations.

Carlini, N.; Liu, C.; Erlingsson, Ú.; Kos, J.; and Song, D.

2019. The Secret Sharer: Evaluating and Testing Unin-

tended Memorization in Neural Networks. In Heninger,

N.; and Traynor, P., eds., 28th USENIX Security Symposium,

USENIX Security 2019, Santa Clara, CA, USA, August 14-

16, 2019, 267–284. USENIX Association.

Carlini, N.; Tramèr, F.; Wallace, E.; Jagielski, M.; Herbert-

Voss, A.; Lee, K.; Roberts, A.; Brown, T. B.; Song, D.; Er-

lingsson, Ú.; Oprea, A.; and Raffel, C. 2021. Extracting

Training Data from Large Language Models. In Bailey,

M.; and Greenstadt, R., eds., 30th USENIX Security Sym-

posium, USENIX Security 2021, August 11-13, 2021, 2633–

2650. USENIX Association.

Devlin, J.; Chang, M.; Lee, K.; and Toutanova, K. 2019.

BERT: Pre-training of Deep Bidirectional Transformers for

Language Understanding. In Burstein, J.; Doran, C.; and

Solorio, T., eds., Proceedings of the 2019 Conference of the

North American Chapter of the Association for Computa-

tional Linguistics: Human Language Technologies, NAACL-

HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Vol-

ume 1 (Long and Short Papers), 4171–4186. Association for

Computational Linguistics.

Gao, L.; Biderman, S.; Black, S.; Golding, L.; Hoppe, T.;

Foster, C.; Phang, J.; He, H.; Thite, A.; Nabeshima, N.;

Presser, S.; and Leahy, C. 2020. The Pile: An 800GB Dataset

of Diverse Text for Language Modeling. arXiv:2101.00027.

Gao, T.; Fisch, A.; and Chen, D. 2021. Making Pre-trained

Language Models Better Few-shot Learners. In Zong, C.;

Xia, F.; Li, W.; and Navigli, R., eds., Proceedings of the 59th

Annual Meeting of the Association for Computational Lin-

guistics and the 11th International Joint Conference on Nat-

ural Language Processing, ACL/IJCNLP 2021, (Volume 1:

Long Papers), Virtual Event, August 1-6, 2021, 3816–3830.

Association for Computational Linguistics.

Huang, J.; Shao, H.; and Chang, K. C. 2022. Are Large Pre-

Trained Language Models Leaking Your Personal Informa-

tion? In Goldberg, Y.; Kozareva, Z.; and Zhang, Y., eds.,

Findings of the Association for Computational Linguistics:

EMNLP 2022, Abu Dhabi, United Arab Emirates, Decem-

ber 7-11, 2022, 2038–2047. Association for Computational

Linguistics.

Jiang, Z.; Xu, F. F.; Araki, J.; and Neubig, G. 2020. How

Can We Know What Language Models Know. Trans. Assoc.

Comput. Linguistics, 8: 423–438.

Kandpal, N.; Wallace, E.; and Raffel, C. 2022. Dedupli-

cating Training Data Mitigates Privacy Risks in Language

Models. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvári,

C.; Niu, G.; and Sabato, S., eds., International Conference

on Machine Learning, ICML 2022, 17-23 July 2022, Balti-

more, Maryland, USA, volume 162 of Proceedings of Ma-

chine Learning Research, 10697–10707. PMLR.

Klimt, B.; and Yang, Y. 2004. The Enron Corpus: A New

Dataset for Email Classification Research. In Boulicaut,

J.; Esposito, F.; Giannotti, F.; and Pedreschi, D., eds., Ma-

chine Learning: ECML 2004, 15th European Conference on

Machine Learning, Pisa, Italy, September 20-24, 2004, Pro-

ceedings, volume 3201 of Lecture Notes in Computer Sci-

ence, 217–226. Springer.

Lee, K.; Ippolito, D.; Nystrom, A.; Zhang, C.; Eck, D.;

Callison-Burch, C.; and Carlini, N. 2022. Deduplicating

Training Data Makes Language Models Better. In Mure-

san, S.; Nakov, P.; and Villavicencio, A., eds., Proceedings

of the 60th Annual Meeting of the Association for Compu-

tational Linguistics (Volume 1: Long Papers), ACL 2022,

Dublin, Ireland, May 22-27, 2022, 8424–8445. Association

for Computational Linguistics.

Lester, B.; Al-Rfou, R.; and Constant, N. 2021. The Power

of Scale for Parameter-Efficient Prompt Tuning. In Moens,

M.; Huang, X.; Specia, L.; and Yih, S. W., eds., Proceedings

of the 2021 Conference on Empirical Methods in Natural

Language Processing, EMNLP 2021, Virtual Event / Punta

Cana, Dominican Republic, 7-11 November, 2021, 3045–

3059. Association for Computational Linguistics.

Li, H.; Guo, D.; Fan, W.; Xu, M.; Huang, J.; Meng, F.; and

Song, Y. 2023. Multi-step Jailbreaking Privacy Attacks on

ChatGPT. arXiv:2304.05197.

Li, X. L.; and Liang, P. 2021. Prefix-Tuning: Optimizing

Continuous Prompts for Generation. In Zong, C.; Xia, F.;

Li, W.; and Navigli, R., eds., Proceedings of the 59th An-

nual Meeting of the Association for Computational Linguis-

tics and the 11th International Joint Conference on Natural

Language Processing, ACL/IJCNLP 2021, (Volume 1: Long

Papers), Virtual Event, August 1-6, 2021, 4582–4597. Asso-

ciation for Computational Linguistics.Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; and Neubig,

G. 2021a. Pre-train, Prompt, and Predict: A Systematic Sur-

vey of Prompting Methods in Natural Language Processing.

CoRR, abs/2107.13586.

Liu, X.; Zheng, Y.; Du, Z.; Ding, M.; Qian, Y.; Yang, Z.; and

Tang, J. 2021b. GPT Understands, Too. arXiv:2103.10385.

Lu, Y.; Bartolo, M.; Moore, A.; Riedel, S.; and Stenetorp,

P. 2022. Fantastically Ordered Prompts and Where to Find

Them: Overcoming Few-Shot Prompt Order Sensitivity. In

Muresan, S.; Nakov, P.; and Villavicencio, A., eds., Proceed-

ings of the 60th Annual Meeting of the Association for Com-

putational Linguistics (Volume 1: Long Papers), ACL 2022,

Dublin, Ireland, May 22-27, 2022, 8086–8098. Association

for Computational Linguistics.

Lukas, N.; Salem, A.; Sim, R.; Tople, S.; Wutschitz, L.; and

Zanella-Béguelin, S. 2023. Analyzing Leakage of Person-

ally Identifiable Information in Language Models. In 2023

2023 IEEE Symposium on Security and Privacy (SP) (SP),

346–363. Los Alamitos, CA, USA: IEEE Computer Society.

Mireshghallah, F.; Uniyal, A.; Wang, T.; Evans, D.; and

Berg-Kirkpatrick, T. 2022. An Empirical Analysis of Mem-

orization in Fine-tuned Autoregressive Language Models. In

Goldberg, Y.; Kozareva, Z.; and Zhang, Y., eds., Proceed-

ings of the 2022 Conference on Empirical Methods in Natu-

ral Language Processing, EMNLP 2022, Abu Dhabi, United

Arab Emirates, December 7-11, 2022, 1816–1826. Associa-

tion for Computational Linguistics.

OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.

Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright,

C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray,

A.; Schulman, J.; Hilton, J.; Kelton, F.; Miller, L.; Simens,

M.; Askell, A.; Welinder, P.; Christiano, P. F.; Leike, J.; and

Lowe, R. 2022. Training language models to follow in-

structions with human feedback. In Koyejo, S.; Mohamed,

S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., eds.,

Advances in Neural Information Processing Systems, vol-

ume 35, 27730–27744. Curran Associates, Inc.

Ozdayi, M.; Peris, C.; FitzGerald, J.; Dupuy, C.; Majmudar,

J.; Khan, H.; Parikh, R.; and Gupta, R. 2023. Controlling the

Extraction of Memorized Data from Large Language Mod-

els via Prompt-Tuning. In Proceedings of the 61st Annual

Meeting of the Association for Computational Linguistics

(Volume 2: Short Papers), 1512–1521. Toronto, Canada: As-

sociation for Computational Linguistics.

Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.;

Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring

the Limits of Transfer Learning with a Unified Text-to-Text

Transformer. J. Mach. Learn. Res., 21: 140:1–140:67.

Shao, H.; Huang, J.; Zheng, S.; and Chang, K. C.-C.

2023. Quantifying Association Capabilities of Large Lan-

guage Models and Its Implications on Privacy Leakage.

arXiv:2305.12707.

Shin, T.; Razeghi, Y.; IV, R. L. L.; Wallace, E.; and Singh,

S. 2020. AutoPrompt: Eliciting Knowledge from Language

Models with Automatically Generated Prompts. In Web-

ber, B.; Cohn, T.; He, Y.; and Liu, Y., eds., Proceedings of

the 2020 Conference on Empirical Methods in Natural Lan-

guage Processing, EMNLP 2020, Online, November 16-20,

2020, 4222–4235. Association for Computational Linguis-

tics.

Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux,

M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.;

Azhar, F.; Rodriguez, A.; Joulin, A.; Grave, E.; and Lample,

G. 2023. LLaMA: Open and Efficient Foundation Language

Models. arXiv:2302.13971.

Zhang, C.; Ippolito, D.; Lee, K.; Jagielski, M.; Tramèr, F.;

and Carlini, N. 2021. Counterfactual Memorization in Neu-

ral Language Models. arXiv:2112.12938.

Zhang, S.; Roller, S.; Goyal, N.; Artetxe, M.; Chen, M.;

Chen, S.; Dewan, C.; Diab, M.; Li, X.; Lin, X. V.; Mi-

haylov, T.; Ott, M.; Shleifer, S.; Shuster, K.; Simig, D.;

Koura, P. S.; Sridhar, A.; Wang, T.; and Zettlemoyer, L.

2022. OPT: Open Pre-trained Transformer Language Mod-

els. arXiv:2205.01068.