Summary of Text Embeddings and Private Information Leakage

Summary Text Embeddings and Private Information Leakage arxiv.org

7,339 words - PDF document - View PDF document

One Line

The Vec2Text method corrects and re-embeds text inputs, recovering 92% of them, while also defending against inversion attacks but having scalability limitations.

Slides

Slide Presentation (9 slides)

Copy slides outline Copy embed code Download as Word

Text Embeddings and Private Information Leakage

Source: arxiv.org - PDF - 7,339 words - view

Introduction

• Text embeddings can reveal private information about the original text.

• This study investigates the problem of embedding inversion and proposes a method called Vec2Text to reconstruct the full text from dense text embeddings.

• Vec2Text can recover 92% of 32-token text inputs exactly through a multi-step approach.

Privacy Threats in Dense Text Embeddings

• Large language models store auxiliary data in dense embeddings, posing privacy threats.

• Can a third-party service reproduce the original text given its embedding?

• Neural networks are generally difficult to invert exactly, but it is often possible to approximate their inverse.

The Vec2Text Method

• The authors frame the problem of recovering textual embeddings as a controlled generation problem.

• Vec2Text uses the difference between a hypothesis embedding and a ground-truth embedding to make discrete updates to the text hypothesis.

• The model is trained on datasets of texts and embeddings and learns to generate text close to a given embedding.

Evaluation of Vec2Text

• The authors evaluate their method on embeddings generated from various retrieval corpuses.

• Vec2Text can recover the inputs for a number of datapoints across different domains.

• Metrics such as BLEU score, Token F1, and exact match are used for evaluation.

Defense Mechanism - Adding Gaussian Noise

• Gaussian noise can be added to embeddings as a defense mechanism against inversion attacks.

• Adding a small amount of noise effectively defends against naive inversion attacks.

• Utility in the nearest-neighbor retrieval setting is preserved with this defense mechanism.

Limitations of the Study

• The scalability of the method to longer text has not been thoroughly investigated.

• The assumption of black-box access to the model used to generate the embeddings may not be realistic in all scenarios.

• The search thoroughness and the impact of word frequency on model correctness have not been extensively studied.

Treating Text Embeddings as Sensitive Private Data

• Text embeddings should be treated as highly sensitive private data.

• The Vec2Text method demonstrates the ability to recover text from its embedding, highlighting the privacy implications.

• Protecting text embeddings accordingly is crucial.

Key Takeaways

• Text embeddings can reveal private information about the original text.

• The Vec2Text method can reconstruct the full text from dense text embeddings, recovering 92% of 32-token text inputs.

• Adding Gaussian noise to embeddings can defend against inversion attacks while preserving utility.

• Scalability, adversary access, search thoroughness, and word frequency impact are limitations of the study.

• Treat text embeddings as highly sensitive private data and protect them accordingly.

Key Points

Text embeddings can reveal private information about the original text.
The study investigates the problem of embedding inversion and proposes a method called Vec2Text to reconstruct the full text from dense text embeddings.
Vec2Text can recover 92% of 32-token text inputs exactly through a multi-step approach.
Large language models store auxiliary data in dense embeddings, which pose privacy threats.
The authors frame the problem of recovering textual embeddings as a controlled generation problem and evaluate their method on various retrieval corpuses.
Gaussian noise can be added to embeddings as a defense mechanism against inversion attacks.
The scalability of the method to longer text, adversary access to the model, search thoroughness, and impact of word frequency are limitations of the study.
Text embeddings should be treated as highly sensitive private data and protected accordingly.

Summaries

21 word summary

Vec2Text method recovers 92% of text inputs by correcting and re-embedding text. It defends against inversion attacks but has scalability limitations.

62 word summary

The study introduces Vec2Text, a method for reconstructing text from text embeddings. It can recover 92% of 32-token text inputs by iteratively correcting and re-embedding the text. The authors evaluate Vec2Text on embeddings from different retrieval corpuses and propose a defense mechanism against inversion attacks. Limitations include scalability and assumptions about adversary access. The study emphasizes the privacy implications of text embeddings.

177 word summary

The study investigates the problem of embedding inversion, which involves reconstructing the original text from text embeddings. The authors propose a method called Vec2Text, which aims to generate text that closely matches a given embedding. The model can recover 92% of 32-token text inputs exactly by iteratively correcting and re-embedding the text. The study focuses on the privacy threats associated with large language models that store auxiliary data in vector databases of dense embeddings. The authors present Vec2Text as a solution and evaluate it on embeddings generated from various retrieval corpuses, successfully recovering the inputs for multiple datapoints across different domains. The authors also consider the privacy implications of dense text embeddings and propose a defense mechanism against inversion attacks by adding Gaussian noise to the embeddings. The study acknowledges limitations such as scalability to longer texts, assumptions about adversary access, and the need for further study on search thoroughness and impact of word frequency. In conclusion, the study emphasizes the privacy implications of text embeddings and highlights the need to treat them as sensitive private data.

410 word summary

This study explores the problem of embedding inversion, which involves reconstructing the original text from dense text embeddings. The authors introduce a method called Vec2Text, which aims to generate text that closely matches a given embedding. Through a multi-step approach, the model can recover 92% of 32-token text inputs exactly by iteratively correcting and re-embedding the text.

The study focuses on large language models that store auxiliary data in vector databases of dense embeddings. These databases are commonly used for efficient embedding searches. However, the privacy threats associated with these databases have not been extensively investigated. The authors question whether a third-party service can reproduce the original text based on its embedding. While neural networks are typically difficult to invert exactly, it is often possible to approximate their inverse based on input-output pairs from the network. This study specifically targets the full reconstruction of input text from its embedding.

To address this problem, the authors present Vec2Text as a solution. This method leverages the difference between a hypothesis embedding and a ground-truth embedding to make discrete updates to the text hypothesis. The model is trained on datasets of texts and embeddings, learning to generate text that closely matches a given embedding. The authors evaluate Vec2Text on embeddings generated from various retrieval corpuses and find that it successfully recovers the inputs for multiple datapoints across different domains.

In terms of experimental setup, the authors train their models on different datasets and evaluate them using metrics such as BLEU score, Token F1, and exact match. They also consider the privacy implications of dense text embeddings and propose a defense mechanism against inversion attacks by adding Gaussian noise to the embeddings. Results show that this approach effectively defends against naive inversion attacks while preserving utility in the nearest-neighbor retrieval setting.

However, the study has limitations. The scalability of Vec2Text to longer texts has not been thoroughly explored. The authors also assume that the adversary has black-box access to the model used for generating the embeddings, which may not be realistic in all scenarios. Additionally, the search thoroughness and the impact of word frequency on model correctness have not been extensively studied.

In conclusion, this study highlights that text embeddings can expose significant private information about the original text. The Vec2Text method demonstrates the ability to recover text from its embedding, emphasizing the privacy implications of text embeddings. The findings suggest that embeddings should be treated as highly sensitive private data and protected accordingly.

482 word summary

Text embeddings can reveal a significant amount of private information about the original text. This study investigates the problem of embedding inversion, which involves reconstructing the full text from dense text embeddings. The authors propose a method called Vec2Text, which aims to generate text that is close to a given embedding. They find that a multi-step approach that iteratively corrects and re-embeds text can recover 92% of 32-token text inputs exactly. The model is trained to decode text embeddings from two state-of-the-art embedding models and is also able to recover important personal information, such as full names, from a dataset of clinical notes.

Large language models often store auxiliary data in a vector database of dense embeddings. These databases are popular for efficient embedding searches at scale. However, the privacy threats within these databases have not been extensively explored. Can a third-party service reproduce the original text given its embedding? While neural networks are generally difficult to invert exactly, it is often possible to approximate their inverse given input-output pairs from the network. Previous work has explored this question for images and shallow networks, but this study targets full reconstruction of input text from its embedding.

The authors frame the problem of recovering textual embeddings as a controlled generation problem. Their method, Vec2Text, uses the difference between a hypothesis embedding and a ground-truth embedding to make discrete updates to the text hypothesis. The model is trained on datasets of texts and embeddings and learns to generate text that is as close as possible to a given embedding. The authors evaluate their method on embeddings generated from various retrieval corpuses and find that it can recover the inputs for a number of datapoints across different domains.

In terms of experimental setup, the authors train their models on different datasets and evaluate them using various metrics such as BLEU score, Token F1, and exact match. They also consider the privacy implications of dense text embeddings and propose adding Gaussian noise to the embeddings as a defense mechanism against inversion attacks. The results show that adding a small amount of noise can effectively defend against naive inversion attacks while still preserving utility in the nearest-neighbor retrieval setting.

The study has several limitations. The scalability of the method to longer text has not been thoroughly investigated. The authors also assume that the adversary has black-box access to the model used to generate the embeddings, which may not be realistic in all scenarios. Additionally, the search thoroughness and the impact of word frequency on model correctness have not been extensively studied.

In conclusion, text embeddings can reveal a significant amount of private information about the original text. The Vec2Text method proposed in this study demonstrates the ability to recover text from its embedding and highlights the privacy implications of text embeddings. The findings suggest that embeddings should be treated as highly sensitive private data and protected accordingly.

Raw indexed text (45,725 chars / 7,339 words / 1,298 lines)

Text Embeddings Reveal (Almost) As Much As Text

John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush

Department of Computer Science

Cornell University

Abstract

How much private information do text em-

beddings reveal about the original text? We

investigate the problem of embedding inver-

sion, reconstructing the full text represented

in dense text embeddings. We frame the prob-

lem as controlled generation: generating text

that, when reembedded, is close to a fixed point

in latent space. We find that although a naïve

model conditioned on the embedding performs

poorly, a multi-step method that iteratively cor-

rects and re-embeds text is able to recover 92%

of 32-token text inputs exactly. We train our

model to decode text embeddings from two

state-of-the-art embedding models, and also

show that our model can recover important per-

sonal information (full names) from a dataset

of clinical notes. 1

Introduction

Systems that utilize large language models (LLMs)

often store auxiliary data in a vector database of

dense embeddings (Borgeaud et al., 2022; Yao

et al., 2023). Users of these systems infuse knowl-

edge into LLMs by inserting retrieved documents

into the language model’s prompt. Practition-

ers are turning to hosted vector database services

to execute embedding search efficiently at scale

(Pinecone; Qdrant; Vdaas; Weaviate; LangChain).

In these databases, the data owner only sends em-

beddings of text data (Le and Mikolov, 2014; Kiros

et al., 2015) to the third party service, and never

the text itself. The database server returns a search

result as the index of the matching document on

the client side.

Vector databases are increasingly popular, but

privacy threats within them have not been compre-

hensively explored. Can the third party service

to reproduce the initial text, given its embedding?

Neural networks are in general non-trivial or even

Our

code

available

github.com/jxmorris12/vec2text.

Github:

impossible to invert exactly. Furthermore, when

querying a neural network through the internet, we

may not have access to the model weights or gradi-

ents at all.

Still, given input-output pairs from a network,

it is often possible to approximate the network’s

inverse. Work on inversion in computer vision

(Mahendran and Vedaldi, 2014; Dosovitskiy and

Brox, 2016) has shown that it is possible to learn

to recover the input image (with some loss) given

the logits of the final layer. Preliminary work has

explored this question for text (Song and Raghu-

nathan, 2020), but only been able to recover an

approximate bag of words given embeddings from

shallow networks.

In this work, we target full reconstruction of in-

put text from its embedding. If text is recoverable,

there is a threat to privacy: a malicious user with ac-

cess to a vector database, and text-embedding pairs

from the model used to produce the data, could

learn a function that reproduces text from embed-

dings.

We frame this problem of recovering textual em-

beddings as a controlled generation problem, where

we seek to generate text such that the text is as close

as possible to a given embedding. Our method,

Vec2Text, uses the difference between a hypothesis

embedding and a ground-truth embedding to make

discrete updates to the text hypothesis.

When we embed web documents using a state-of-

the-art black-box encoder, our method can recover

32-token inputs with a near-perfect BLEU score of

97.3, and can recover 92% of the examples exactly.

We then evaluate on embeddings generated from

a variety of common retrieval corpuses from the

BEIR benchmark. Even though these texts were

not seen during training, our method is able to per-

fectly recover the inputs for a number of datapoints

across a variety of domains. We evaluate on em-

beddings of clinical notes from MIMIC and are

able to recover 89% of full names from embeddedΦ

Target embedding

Mage (foaled April 18, 2020) is an

American Thoroughbred racehorse who

won the 2023 Kentucky Derby.

Correction

Kentucky Derby , which was won by Mage

(April 20 , 20 1 0), who is an American

Thoroughbred horse and mare .

Figure 1: Overview of our method, Vec2Text. Given access to a target embedding e (blue) and query access to an

embedding model ϕ (blue model), the system aims to iteratively generate (yellow model) hypotheses ê (pink) to

reach the target. Example input is a taken from a recent Wikipedia article (June 2023). Vec2Text perfectly recovers

this text from its embedding after 4 rounds of correction.

notes. These results imply that text embeddings

present the same threats to privacy as the text from

which they are computed, and embeddings should

be treated with the same precautions as raw data.

Overview: Embedding Inversion

Text embedding models learn to map text se-

quences to embedding vectors. Embedding vec-

tors are useful because they encode some notion

of semantic similarity: inputs that are similar in

meaning should have embeddings that are close in

vector space (Mikolov et al., 2013). Embeddings

are commonly used for many tasks such as search,

clustering, and classification (Aggarwal and Zhai,

2012; Neelakantan et al., 2022; Muennighoff et al.,

2023).

Given a text sequence of tokens x ∈ V n , a text

encoder ϕ : V n → R d maps x to a fixed-length

embedding vector e ∈ R d .

Now consider the problem of inverting textual

embeddings: given some unknown encoder ϕ, we

seek to recover the text x given its embedding

e = ϕ(x). Text embedding models are typically

trained to encourage similarity between related in-

puts (Karpukhin et al., 2020). Thus, we can write

the problem as recovering text that has a maxi-

mally similar embedding to the ground-truth. We

can formalize the search for text x̂ with embedding

e under encoder ϕ as optimization:

x̂ = arg max cos(ϕ(x), e)

(1)

Assumptions of our threat model. In a practical

sense, we consider the scenario where an attacker

wants to invert a single embedding produced from

a black-box embedder ϕ. We assume that the at-

tacker has access to ϕ: given hypothesis text x̂, the

attacker can query the model for ϕ(x̂) and compute

cos(ϕ(x̂), e). When this term is 1 exactly, the at-

tacker can be sure that x̂ was the original text, i.e.

collisions are rare and can be ignored.

3.1

Method: Vec2Text

Base Model: Learning to Invert ϕ

Enumerating all possible sequences to compute

Equation (1) is computationally infeasible. One

way to avoid this computational constraint is by

learning a distribution of texts given embeddings.

Given a dataset of texts D = {x 1 , . . .}, we learn to

invert encoder ϕ by learning a distribution of texts

given embeddings, p(x | e; θ), by learning θ via

maximum likelihood:

θ = arg max E x∼D [p(x | ϕ(x); θ̂)]

θ̂

We drop the θ hereon for simplicity of notation.

In practice, this process involves training a condi-

tional language model to reconstruct unknown text

x given its embedding e = ϕ(x). We can view this

learning problem as amortizing the combinatorial

optimization (Equation (1)) into the weights of a

neural network. Directly learning to generate sat-

isfactory text in this manner is well-known in the

literature to be a difficult problem.3.2

Controlling Generation for Inversion

To improve upon this model, we propose Vec2Text

shown in Figure 1. This approach takes inspira-

tion from methods for Controlled Generation, the

task of generating text that satisfies a known con-

dition (Hu et al., 2018; John et al., 2018; Yang

and Klein, 2021). This task is similar to inversion

in that there is a observable function ϕ that deter-

mines the level of control. However, it differs in

that approaches to controlled generation (Dathathri

et al., 2020; Li et al., 2022) generally require differ-

entiating through ϕ to improve the score of some

intermediate representation. Textual inversion dif-

fers in that we can only make queries to ϕ, and

cannot compute its gradients.

Model. The method guesses an initial hypoth-

esis and iteratively refines this hypothesis by re-

embedding and correcting the hypothesis to bring

its embedding closer to e. Note that this model re-

quires computing a new embedding ê (t) = ϕ(x (t) )

in order to generate each new correction x (t+1) .

We define our model recursively by marginalizing

over intermediate hypotheses:

p(x (t+1) | e) =

p(x (t) | e)p(x (t+1) | e, x (t) , ê (t) )

x (t)

(t)

ê

= ϕ(x (t) )

with a base case of the simple learned inversion:

p(x (0) | e) = p(x (0) | e, ∅, ϕ(∅))

Here, x (0) represents the initial hypothesis gener-

ation, x (1) the correction of x (0) , and so on. We

train this model by first generating hypotheses x (0)

from the model in Section 3.1, computing ê (0) , and

then training a model on this generated data.

This method relates to other recent work gener-

ating text through iterative editing (Lee et al., 2018;

Ghazvininejad et al., 2019). Especially relevant

is Welleck et al. (2022), which proposes to train

a text-to-text ‘self-correction’ module to improve

language model generations with feedback.

Parameterization. The backbone of our model,

p(x (t+1) | e, x (t) , ê (t) ), is parameterized as a stan-

dard encoder-decoder transformer (Vaswani et al.,

2017; Raffel et al., 2020) conditioned on the previ-

ous output.

One challenge is the need to input conditioning

embeddings e and ê (t) into a transformer encoder,

which requires a sequence of embeddings as input

with some dimension d enc not necessarily equal

to the dimension d of ϕ’s embeddings. Similar to

Mokady et al. (2021), we use small MLP to project

a single embedding vector to a larger size, and

reshape to give it a sequence length as input to the

encoder. For embedding e ∈ R d :

EmbToSeq(e) = W 2 σ(W 1 e)

where W 1 ∈ R d×d and W 2 ∈ R (sd enc )×d for some

nonlinear activation function σ and predetermined

encoder “length” s. We use a separate MLP to

project three vectors: the ground-truth embedding

e, the hypothesis embedding ê (t) , and the differ-

ence between these vectors e − ê. Given the word

embeddings of the hypothesis x (t) are {w 1 ...w n },

the input (length 3s + n) to the encoder is as fol-

lows:

concat(EmbToSeq(e),

EmbToSeq(ê (t) ),

EmbToSeq(e − ê (t) ), (w 1 ...w n ))

We feed the concatenated input to the encoder and

train the full encoder-decoder model using standard

language modeling loss.

Inference. In practice we cannot tractably sum

out intermediate generations x (t) , so we approxi-

mate this summation via beam search. We perform

inference from our model greedily at the token level

but implement beam search at the sequence-level

x (t) . At each step of correction, we consider some

number b of possible corrections as the next step.

For each possible correction, we decode the top

b possible continuations, and then take the top b

unique continuations out of b · b potential contin-

uations by measuring their distance in embedding

space to the ground-truth embedding e.

Experimental Setup

Embeddings. Vec2Text is trained to invert two

state-of-the-art embedding models: GTR-base (Ni

et al., 2021), a T5-based pre-trained transformer

for text retrieval, and text-embeddings-ada-002

available via the OpenAI API. Both model fami-

lies are among the highest-performing embedders

on the MTEB text embeddings benchmark (Muen-

nighoff et al., 2023).method tokens pred tokens bleu tf1 exact cos

Bag-of-words (Song and Raghunathan, 2020)

GPT-2 Decoder (Li et al., 2023)

Base [0 steps]

(+ beam search)

(+ nucleus)

Vec2Text [1 step]

[20 steps]

[50 steps]

[50 steps + sbeam] 32

32 32

32 0.3

1.0

31.9

34.5

25.3

50.7

83.9

85.4

97.3 51

99 0.0

0.0

1.0

0.0

40.2

40.6

92.0 0.70

0.76

0.91

0.92

0.88

0.96

0.99

Base [0 steps]

Vec2Text [1 step]

[20 steps]

[50 steps]

[50 steps + sbeam] 31.8

31.8

31.8 31.8

31.9

31.8 26.2

44.1

61.9

62.3

83.4 61

96 0.0

5.2

15.0

14.8

60.9 0.94

0.96

0.98

0.99

Base [0 steps]

Vec2Text [1 step]

[20 steps]

[50 steps]

[50 steps + sbeam] 80.9

80.9

80.9 84.2

81.6

79.7

80.5

80.6 17.0

29.9

43.1

44.4

55.0 54

84 0.6

1.4

3.2

3.4

8.0 0.95

0.97

0.99

Table 1: Reconstruction score on in-domain datasets. Top section of results come from models trained to reconstruct

32 tokens of text from Wikpedia, embedded using GTR-base. Remaining results come from models trained to

reconstruct up to 32 or 128 tokens from MSMARCO, embedded using OpenAI text-embeddings-ada-002.

Datasets. We train our GTR model on 5M pas-

sages from Wikipedia articles selected from the

Natural Questions corpus (Kwiatkowski et al.,

2019) truncated to 32 tokens. We train our two

OpenAI models (Bajaj et al., 2018), both on ver-

sions of the MSMARCO corpus with maximum 32

or 128 tokens per example 2 . For evaluation, we

consider the evaluation datasets from Natural Ques-

tions and MSMarco, as well as two out-of-domain

settings: the MIMIC-III database of clinical notes

(Johnson et al., 2016) in addition to the variety

of datasets available from the BEIR benchmark

(Thakur et al., 2021).

Baseline. As a baseline, we train the base model

p(x (0) | e) to recover text with no correction steps.

We also evaluate the bag of words model from

Song and Raghunathan (2020). To balance for the

increased number of queries allotted to the cor-

rection models, we also consider taking the top-N

predictions made from the unconditional model via

beam search and nucleus sampling (p = 0.9) and

reranking via cosine similarity.

By 2023 pricing of $0.0001 per 1000 tokens, embedding

5 million documents of 70 tokens each costs $35.

Metrics. We use two types of metrics to measure

the progress and the accuracy of reconstructed text.

First we consider our main goal of text reconstruc-

tion. To measure this we use word-match metrics

including: BLEU score (Papineni et al., 2002), a

measure of n-gram similarities between the true

and reconstructed text; Token F1, the multi-class

F1 score between the set of predicted tokens and

the set of true tokens; Exact-match, the percent-

age of reconstructed outputs that perfectly match

the ground-truth. We also report the similarity on

the internal inversion metric in terms of recovering

the vector embedding in latent space. We use co-

sine similarity between the true embedding and the

embedding of reconstructed text according to ϕ.

Models and Inference. We initialize our models

from the T5-base checkpoint (Raffel et al., 2020).

Including the projection head, each model has ap-

proximately 235M parameters. We set the projec-

tion sequence length s = 16 for all experiments,

as preliminary experiments show diminishing re-

turns by increasing this number further. We per-

form inference on all models using greedy token-

level decoding. When running multiple steps ofmethod bleu token F1 15.7 Base

Vec2Text 36.2

95.5 73.8

98.6 signal1m 23.7 Base

Vec2Text 13.2

80.7 49.5

92.5 2 ∗ 10 −4 with warmup and linear decay. We train

models for 100 epochs. We use batch size of 128

and train all models on 4 NVIDIA A6000 GPUs.

Under these conditions, training our slowest model

takes about two days.

msmarco 72.1 Base

Vec2Text 15.5

59.6 54.1

86.1 5

climate-fever 73.4 Base

Vec2Text 12.8

44.9 49.3

82.6 5.1

fever 73.4 Base

Vec2Text 12.6

45.1 49.2

82.7 dbpedia-entity 91.3 Base

Vec2Text 15.4

48.0 50.3

77.9 nq 94.7 Base

Vec2Text 11.0

32.7 47.1

72.7 hotpotqa 94.8 Base

Vec2Text 15.4

46.6 50.1

78.7 fiqa 103.8 Base

Vec2Text 6.6

21.5 44.1

63.6 webis-touche2020 105.2 Base

Vec2Text 6.6

19.6 41.5

69.7 cqadupstack 106.4 Base

Vec2Text 7.1

23.3 41.5

64.3 arguana 113.5 Base

Vec2Text 6.8

23.4 44.1

66.3 scidocs 125.3 Base

Vec2Text 5.9

17.7 38.5

57.6 trec-covid 125.4 Base

Vec2Text 5.6

19.3 36.3

58.6 robust04 127.3 Base

Vec2Text 4.9

15.5 34.4

54.5 bioasq 127.4 Base

Vec2Text 5.3

22.8 35.7

59.5 scifact 127.4 Base

Vec2Text 4.9

16.6 35.2

56.6 nfcorpus 127.7 Base

Vec2Text 6.2

25.8 39.6

64.8 trec-news 128.0 Base

Vec2Text 4.9

14.5 34.8

51.5 dataset tokens

quora

Table 2: Out-of-domain reconstruction performance

measured on datasets from the BEIR benchmark. We

sort datasets in order of average length in order to em-

phasize the effect of sequence length on task difficulty.

sequence-level beam search, we only take a new

generation if it is closer than the previous step in

cosine similarity to the ground-truth embedding.

We use unconditional models to seed the initial

hypothesis for our iterative models. We examine

the effect of using a different initial hypothesis in

Section 7.

We use the Adam optimizer and learning rate of

Results

Reconstruction: In-Domain

Table 1 contains in-domain results. Our method

outperforms the baselines on all metrics. More

rounds is monotonically helpful, although we see

diminishing returns – we are able to recover 77%

of BLEU score in just 5 rounds of correction, al-

though running for 50 rounds indeed achieves a

higher reconstruction performance. We find that

running sequence-level beam search (sbeam) over

the iterative reconstruction is particularly helpful

for finding exact matches of reconstructions, in-

creasing the exact match score by 2 to 6 times

across the three settings. In a relative sense, the

model has more trouble exactly recovering longer

texts, but still is able to get many of the words.

5.2

Reconstruction: Out-of-Domain

We evaluate our model on 15 datasets from the

BEIR benchmark and display results in Table 2.

Quora, the shortest dataset in BEIR, is the easiest

to reconstruct, and our model is able to exactly re-

cover 66% of examples. Our model adapts well

to different-length inputs, generally producing re-

constructions with average length error of fewer

than 3 tokens. In general, reconstruction accuracy

inversely correlates with example length (discussed

more in Section 7). On all datasets, we are able to

recover sequences with Token F1 of at least 41 and

cosine similarity to the true embedding of at least

0.95.

5.3

Case study: MIMIC

As a specific threat domain, we consider MIMIC-

III clinical notes (Johnson et al., 2016). Because

the original release of MIMIC is completely deiden-

tified, we instead use the “pseudo re-identified” ver-

sion from Lehman et al. (2021) where fake names

have been inserted in the place of the deidentified

ones.

Each note is truncated to 32 tokens and the notes

are filtered so that they each contain at least one

name. We measure the typical statistics of our

method as well as three new ones: the percentage

of first names, last names, and complete nameslast full bleu tf1 exact cos

40.0

94.2 27.8

95.3 10.8

89.2 4.9

55.6 33.1

80.8 0.

26.0 0.78

0.98

Clinical event

Nonbiological location

Therapeutic procedure

Lab value

Biological structure

Sign symptom

Disease disorder

Medication

Diagnostic procedure

Detailed description

0.0

MIMIC-III Entity reconstruction rate

0.25

0.20

0.15

0.10

0.05

0.00

Noise Level ( )

10 0

Figure 2: Retrieval performance and reconstruction ac-

curacy across varying levels of noise injection.

0.2

0.4

0.6

0.8

1.0

Table 3: Performance of our method on reconstructing

GTR-embedded clinical notes from MIMIC III (Johnson

et al., 2016).

that are recovered. Results are shown in Table 3.

Vec2Text is able to recover 94% of first names,

95% of last names, and 89% of full names (first,

last format) while recovering 26% of the docu-

ments exactly.

For the recovered clinical notes from Section 5.3,

we extract entities from each true and recovered

note using a clinical entity extractor (Raza et al.,

2022). We plot the recovery percentage in 3 (bot-

tom) with the average entity recovery shown as

a dashed line. Our model is most accurate at re-

constructing entities of the type “Clinical Event”,

which include generic medical words like ‘arrived’,

‘progress’, and ‘transferred’. Our model is least

accurate in the “Detailed Description” category,

which includes specific medical terminology like

‘posterior’ and ‘hypoxic’, as well as multi-word

events like ‘invasive ventilation - stop 4:00 pm’.

Although we are able to recover 26% of 32-

token notes exactly, the notes that were not exactly

recovered are semantically close to the original.

Our model generally matches the syntax of notes,

even when some entities are slightly garbled; for

example, given the following sentence from a doc-

tor’s note “Rhona Arntson npn/- # resp: infant re-

mains orally intubated on imv / r fi” our model

predicts “Rhona Arpson nrft:# infant remains intu-

bated orally on resp. imv. m/n fi”.

Retrieval

Reconstruction

0.30

first Base

Vec2Text method

Defending against inversion attacks

Is it easy for users of text embedding models pro-

tect their embeddings from inversion attacks? We

consider a basic defense scenario as a sanity check.

To implement our defense, the user addes a level of

Gaussian noise directly to each embedding with

the goal of effectively defending against inver-

sion attacks while preserving utility in the nearest-

neightbor retrieval setting. We analyze the trade-off

between retrieval performance and reconstruction

accuracy under varying levels of noise.

Formally, we define a new embedding model as:

ϕ noisy (x) = ϕ(x) + λ · ϵ, ϵ ∼ N (0, 1)

where λ is a hyperparameter controlling the amount

of noise injected.

We simulate this scenario with ϕ as GTR-base

using our self-corrective model with 10 steps of

correction, given the noisy embedder ϕ noisy . To

measure retrieval performance, we take the mean

NDCG@10 (a metric of retrieval performance;

higher is better) across 15 different retrieval tasks

from the BEIR benchmark, evaluated across vary-

ing levels of noise.

We graph the average retrieval performance in

Figure 2 (see A.2 for complete tables of results).

At a noise level of λ = 10 −1 , we see retrieval per-

formance is preserved, while BLEU score drops

by 10%. At a noise level of 0.01, retrieval perfor-

mance is barely degraded (2%) while reconstruc-

tion performance plummets to 13% of the original

BLEU. Adding any additional noise severely im-

pacts both retrieval performance and reconstruction

accuracy. These results indicate that adding a small

amount of Gaussian noise may be a straightforward

way to defend against naive inversion attacks, al-

though it is possible that training with noise could

in theory help Vec2Text recover more accurately

from ϕ noisy . Note that low reconstruction BLEU

score is not necessarily indicative that coarser infer-

ences, such as clinical area or treatment regimen,

cannot be made from embeddings.80 25000

60 20000

30000

Feedback

No feedback

Iteration (t)

10000

5000

Figure 3: Recovery performance across multiple rounds

of self-correction comparing models with access to ϕ vs

text-only (32 tokens per sequence).

15000

Analysis

How much does the model rely on feedback from

ϕ? Figure 3 shows an ablation study of the im-

portance of feedback, i.e. performing corrections

with and without embedding the most recent hy-

pothesis. The model trained with feedback (i.e. ad-

ditional conditioning on ϕ(x (t) ) is able to make a

more accurate first correction and gets better BLEU

score with more rounds. The model trained with

no feedback can still edit the text but does not

receive more information about the geometry of

the embedding space, and quickly plateaus. The

most startling comparison is in terms of the number

of exact matches: after 50 rounds of greedy self-

correction, our model with feedback gets 52.0%

of examples correct (after only 1.5% initially);

the model trained without feedback only perfectly

matches 4.2% of examples after 50 rounds.

During training, the model only learns to cor-

rect a single hypothesis to the ground-truth sample.

Given new text at test time, our model is able to

correct the same text multiple times, “pushing” the

text from 0.9 embedding similarity to 1.0. We plot

the closeness of the first hypothesis to the ground-

truth in the training data for the length-32 model

in Figure 4. We see that during training the model

learns to correct hypotheses across a wide range

of closenesses, implying that corrections should

not go ‘out-of-distribution’ as they approach the

ground-truth.

How informative are embeddings for textual re-

covery? We graph BLEU score vs. cosine simi-

larity from a selection of of reconstructed text in-

puts in Figure 5. We observe a strong correlation

between the two metrics. Notably, there are very

0.70

0.75

0.80

0.85

0.90

0.95

1.00

Cosine distance from hypothesis to true embedding

Figure 4: Distribution of cos(e, ϕ(x (0) )) over training

data. The mean training output from the GTR base

model has a cosine similarity of 0.924 with the true

embedding.

few generated samples with high cosine similarity

but low BLEU score. This implies that better fol-

lowing embedding geometry will further improves

systems. Theoretically some embeddings might

be impossible to recover. Prior work (Song et al.,

2020; Morris, 2020) has shown that two different

sequences can ‘collide’ in text embedding space,

having similar embeddings even without any word

overlap. However, our experiments found no ev-

idence that collisions are a problem; they either

do not exist or our model learns during training to

avoid outputting them. Improved systems should

be able to recover longer text.

Does having a strong base model matter? We

ablate the impact of initialization by evaluating our

32-token Wikipedia model at different initializa-

tions of x (0) , as shown in Section 7. After running

for 20 steps of correction, our model is able to re-

cover from an unhelpful initialization, even when

the initialization is a random sequence of tokens.

This suggests that the model is able to ignore bad

hypotheses and focus on the true embedding when

the hypothesis is not helpful.

Related work

Inverting deep embeddings. The task of invert-

ing textual embeddings is closely related to re-

search on inverting deep visual representations in

computer vision (Mahendran and Vedaldi, 2014;

Dosovitskiy and Brox, 2016; Teterwak et al., 2021;

Bordes et al., 2021), which show that a high amount

of visual detail remains in the logit vector of an im-

age classifier, and attempt to reconstruct input im-

ages from this vector. There is also a line of work

reverse-engineering the content of certain text em-Input Nabo Gass (25 August, 1954 in Ebingen, Germany) is a German painter and glass artist.

Round 1 (0.85):

Round 2 (0.99):

Round 3 (0.99):

Round 4 (1.00): Nabo Gass (11 August 1974 in Erlangen, Germany) is an artist.

Nabo Gass (b. 18 August 1954 in Egeland, Germany) is a German painter and glass artist.

Nabo Gass (25 August 1954 in Ebingen, Germany) is a German painter and glass artist.

Nabo Gass (25 August, 1954 in Ebingen, Germany) is a German painter and glass artist.

✗

✓

Table 4: Example of our corrective model working in multiple rounds. Left column shows the correction number,

from Round 1 (initial hypothesis) to Round 4 (correct guess). The number in parenthesis is the cosine similarity

between the guess’s embedding and the embedding of the ground-truth sequence (first row).

Initialization token f1 cos exact

Random tokens

"the " * 32

"there’s no reverse on a motorcycle,

as my friend found out quite

dramatically the other day" 0.95

0.95 0.99

0.99 50.0

49.8

0.96 0.99 52.0

Base model p(x (0) | e) 0.96 0.99 51.6

Table 5: Ablation: Reconstruction score on Wikipedia

data (32 tokens) given various initializations. Our self-

correction model is able to faithfully recover the original

text with greater than 80 BLEU score, even with a poor

initialization. Models run for 20 steps of correction.

100

0.85

0.90

0.95

Cosine similarity

1.00

Figure 5: Cosine similarity vs BLEU score on 1000

reconstructed embeddings from Natural Questions text.

beddings: Ram et al. (2023) analyze the contents

of text embeddings by projecting embeddings into

the model’s vocabulary space to produce a distri-

bution of relevant tokens. Adolphs et al. (2022)

train a single-step query decoder to predict the text

of queries from their embeddings and use the de-

coder to produce more data to train a new retrieval

model. We focus directly on text reconstruction

and its implications for privacy, and propose an

iterative method that works for paragraph-length

documents, not just sentence-length queries.

Privacy leakage from embeddings. Research

has raised the question of information leakage from

dense embeddings. In vision, Vec2Face (Duong

et al., 2020) shows that faces can be reconstructed

from their deep embeddings. Similar questions

have been asked about text data: Lehman et al.

(2021) attempt to recover sensitive information

such as names from representations obtained from

a model pre-trained on clinical notes, but fail to

recover exact text. Kim et al. (2022) propose a

privacy-preserving similarity mechanism for text

embeddings and consider a shallow bag-of-words

inversion model. Abdalla et al. (2020) analyze

the privacy leaks in training word embeddings on

medical data and are able to recover full names in

the training data from learned word embeddings.

Dziedzic et al. (2023) note that stealing sentence en-

coders by distilling through API queries works well

and is difficult for API providers to prevent. Song

and Raghunathan (2020) considered the problem

of recovering text sequences from embeddings, but

only attempted to recover bags of words from the

embeddings of a shallow encoder model. Li et al.

(2023) investigate the privacy leakage of embed-

dings by training a decoder with a text embedding

as the first embedding fed to the decoder. Com-

pared to these works, we consider the significantly

more involved problem of developing a method to

recover the full ordered text sequence from more

realistic state-of-the-art text retrieval models.

Gradient leakage. There are parallels between

the use of vector databases to store embeddings

and the practice of federated learning, where users

share gradients with one another in order to jointly

train a model. Our work on analyzing the pri-

vacy leakage of text embeddings is analogous to

research on gradient leakage, which has shown that

certain input data can be reverse-engineered from

the model’s gradients during training (Melis et al.,

2018; Zhu et al., 2019; Zhao et al., 2020; Geiping

et al., 2020). Zhu et al. (2019) even shows that they

can recover text inputs of a masked language model

by backpropagating to the input layer to match the

gradient. However, such techniques do not applyto textual inversion: the gradient of the model is

relatively high-resolution; we consider the more

difficult problem of recovering the full input text

given only a single dense embedding vector.

Text autoencoders. Past research has explored

natural language processing learning models that

map vectors to sentences (Bowman et al., 2016).

These include some retrieval models that are

trained with a shallow decoder to reconstruct the

text or bag-of-words from the encoder-outputted

embedding (Xiao et al., 2022; Shen et al., 2023;

Wang et al., 2023). Unlike these, we invert embed-

dings from a frozen, pre-trained encoder.

Conclusion

We propose Vec2Text, a multi-step method that

iteratively corrects and re-embeds text based on

a fixed point in latent space. Our approach can

recover 92% of 32-token text inputs from their em-

beddings exactly, demonstrating that text embed-

dings reveal much of the original text. The model

also demonstrates the ability to extract critical clini-

cal information from clinical notes, highlighting its

implications for data privacy in sensitive domains

like medicine.

Our findings indicate a sort of equivalence be-

tween embeddings and raw data, in that both leak

similar amounts of sensitive information. This

equivalence puts a heavy burden on anonymization

requirements for dense embeddings: embeddings

should be treated as highly sensitive private data

and protected, technically and perhaps legally, in

the same way as one would protect raw text.

Limitations

Adaptive attacks and defenses. We consider the

setting where an adversary applies noise to newly

generated embeddings, but the reconstruction mod-

ules were trained from un-noised embeddings. Fu-

ture work might consider reconstruction attacks or

defenses that are adaptive to the type of attack or

defense being used.

Search thoroughness. Our search is limited; in

this work we do not test beyond searching for 50

rounds or with a sequence beam width higher than

8. However, Vec2Text gets monotonically better

with more searching. Future work could find even

more exact matches by searching for more rounds

with a higher beam width, or by implementing

more sophisticated search algorithms on top of our

corrective module.

Scalability to long text. Our method is shown to

recover most sequences exactly up to 32 tokens and

some information up to 128 tokens, but we have not

investigated the limits of inversion beyond embed-

dings of this length. Popular embedding models

support embedding text content on the order of

thousands of tokens, and embedding longer texts

is common practice (Thakur et al., 2021). Future

work might explore the potential and difficulties of

inverting embeddings of these longer texts.

Access to embedding model. Our threat model

assumes that an adversary has black-box access

to the model used to generate the embeddings in

the compromised database. In the real world, this

is realistic because practitioners so often rely on

the same few large models. However, Vec2Text

requires making a query to the black-box embed-

ding model for each step of refinement. Future

work might explore training an imitation embed-

ding model which could be queried at inference

time to save queries to the true embedder.

References

Mohamed Abdalla, Moustafa Abdalla, Graeme Hirst,

and Frank Rudzicz. 2020. Exploring the privacy-

preserving properties of word embeddings: Al-

gorithmic validation study. J Med Internet Res,

22(7):e18055.

Leonard Adolphs, Michelle Chen Huebscher, Christian

Buck, Sertan Girgin, Olivier Bachem, Massimiliano

Ciaramita, and Thomas Hofmann. 2022. Decoding a

neural retriever’s latent space for query suggestion.

Charu C. Aggarwal and ChengXiang Zhai. 2012. A

Survey of Text Clustering Algorithms, pages 77–128.

Springer US, Boston, MA.

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng,

Jianfeng Gao, Xiaodong Liu, Rangan Majumder, An-

drew McNamara, Bhaskar Mitra, Tri Nguyen, Mir

Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary,

and Tong Wang. 2018. Ms marco: A human gener-

ated machine reading comprehension dataset.

Florian Bordes, Randall Balestriero, and Pascal Vincent.

2021. High fidelity visualization of what your self-

supervised representation knows about. Trans. Mach.

Learn. Res., 2022.

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann,

Trevor Cai, Eliza Rutherford, Katie Millican, George

van den Driessche, Jean-Baptiste Lespiau, Bogdan

Damoc, Aidan Clark, Diego de Las Casas, AureliaGuy, Jacob Menick, Roman Ring, Tom Hennigan,

Saffron Huang, Loren Maggiore, Chris Jones, Albin

Cassirer, Andy Brock, Michela Paganini, Geoffrey

Irving, Oriol Vinyals, Simon Osindero, Karen Si-

monyan, Jack W. Rae, Erich Elsen, and Laurent Sifre.

2022. Improving language models by retrieving from

trillions of tokens.

Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, An-

drew M. Dai, Rafal Jozefowicz, and Samy Bengio.

2016. Generating sentences from a continuous space.

Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane

Hung, Eric Frank, Piero Molino, Jason Yosinski, and

Rosanne Liu. 2020. Plug and play language models:

A simple approach to controlled text generation. In

International Conference on Learning Representa-

tions.

Alexey Dosovitskiy and Thomas Brox. 2016. Inverting

visual representations with convolutional networks.

Chi Nhan Duong, Thanh-Dat Truong, Kha Gia Quach,

Hung Bui, Kaushik Roy, and Khoa Luu. 2020.

Vec2face: Unveil human faces from their blackbox

features in face recognition.

Adam Dziedzic, Franziska Boenisch, Mingjian Jiang,

Haonan Duan, and Nicolas Papernot. 2023. Sentence

embedding encoders are easy to steal but hard to

defend. In ICLR 2023 Workshop on Pitfalls of limited

data and computation for Trustworthy ML.

Jonas Geiping, Hartmut Bauermeister, Hannah Dröge,

and Michael Moeller. 2020. Inverting gradients –

how easy is it to break privacy in federated learning?

Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and

Luke Zettlemoyer. 2019. Mask-predict: Parallel de-

coding of conditional masked language models.

Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan

Salakhutdinov, and Eric P. Xing. 2018. Toward con-

trolled generation of text.

Vineet John, Lili Mou, Hareesh Bahuleyan, and Olga

Vechtomova. 2018. Disentangled representation

learning for non-parallel text style transfer.

Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li-

wei H. Lehman, Mengling Feng, Mohammad Ghas-

semi, Benjamin Moody, Peter Szolovits, Leo An-

thony Celi, and Roger G. Mark. 2016. Mimic-iii,

a freely accessible critical care database. Scientific

Data, 3(1):160035.

Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick

Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and

Wen tau Yih. 2020. Dense passage retrieval for open-

domain question answering.

Donggyu Kim, Garam Lee, and Sungwoo Oh. 2022.

Toward privacy-preserving text embedding similarity

with homomorphic encryption. In Proceedings of the

Fourth Workshop on Financial Technology and Nat-

ural Language Processing (FinNLP), pages 25–36,

Abu Dhabi, United Arab Emirates (Hybrid). Associa-

tion for Computational Linguistics.

Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov,

Richard S. Zemel, Antonio Torralba, Raquel Urta-

sun, and Sanja Fidler. 2015. Skip-thought vectors.

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Red-

field, Michael Collins, Ankur Parikh, Chris Alberti,

Danielle Epstein, Illia Polosukhin, Jacob Devlin, Ken-

ton Lee, Kristina Toutanova, Llion Jones, Matthew

Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob

Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natu-

ral questions: A benchmark for question answering

research. Transactions of the Association for Compu-

tational Linguistics, 7:452–466.

LangChain. 2023. Hwchase17/langchain: building ap-

plications with llms through composability.

Quoc V. Le and Tomas Mikolov. 2014. Distributed

representations of sentences and documents.

Jason Lee, Elman Mansimov, and Kyunghyun Cho.

2018. Deterministic non-autoregressive neural se-

quence modeling by iterative refinement.

Eric Lehman, Sarthak Jain, Karl Pichotta, Yoav Gold-

berg, and Byron C. Wallace. 2021. Does bert pre-

trained on clinical notes reveal sensitive data?

Haoran Li, Mingshi Xu, and Yangqiu Song. 2023. Sen-

tence embedding leaks more information than you

expect: Generative embedding inversion attack to

recover the whole sentence.

Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy

Liang, and Tatsunori B. Hashimoto. 2022. Diffusion-

lm improves controllable text generation.

Aravindh Mahendran and Andrea Vedaldi. 2014. Un-

derstanding deep image representations by inverting

them. 2015 IEEE Conference on Computer Vision

and Pattern Recognition (CVPR), pages 5188–5196.

Luca Melis, Congzheng Song, Emiliano De Cristofaro,

and Vitaly Shmatikov. 2018. Exploiting unintended

feature leakage in collaborative learning.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey

Dean. 2013. Efficient estimation of word representa-

tions in vector space.

Ron Mokady, Amir Hertz, and Amit H. Bermano. 2021.

Clipcap: Clip prefix for image captioning.

John X. Morris. 2020. Second-order nlp adversarial

examples.

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and

Nils Reimers. 2023. Mteb: Massive text embedding

benchmark.

Arvind Neelakantan, Tao Xu, Raul Puri, Alec Rad-

ford, Jesse Michael Han, Jerry Tworek, Qiming

Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy,

Johannes Heidecke, Pranav Shyam, Boris Power,Tyna Eloundou Nekoul, Girish Sastry, Gretchen

Krueger, David Schnurr, Felipe Petroski Such, Kenny

Hsu, Madeleine Thompson, Tabarak Khan, Toki

Sherbakov, Joanne Jang, Peter Welinder, and Lilian

Weng. 2022. Text and code embeddings by con-

trastive pre-training.

Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gus-

tavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao,

Yi Luan, Keith B. Hall, Ming-Wei Chang, and Yinfei

Yang. 2021. Large dual encoders are generalizable

retrievers.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-

Jing Zhu. 2002. Bleu: a method for automatic evalu-

ation of machine translation. In Proceedings of the

40th Annual Meeting of the Association for Compu-

tational Linguistics, pages 311–318, Philadelphia,

Pennsylvania, USA. Association for Computational

Linguistics.

Pinecone. 2023. Pinecone.

Qdrant. 2023. Qdrant - vector database.

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine

Lee, Sharan Narang, Michael Matena, Yanqi Zhou,

Wei Li, and Peter J. Liu. 2020. Exploring the limits

of transfer learning with a unified text-to-text trans-

former.

Ori Ram, Liat Bezalel, Adi Zicher, Yonatan Belinkov,

Jonathan Berant, and Amir Globerson. 2023. What

are you token about? dense retrieval as distributions

over the vocabulary.

Shaina Raza, Deepak John Reji, Femi Shajan, and

Syed Raza Bashir. 2022. Large-scale application

of named entity recognition to biomedicine and epi-

demiology. PLOS Digital Health, 1(12):e0000152.

Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiao-

long Huang, Binxing Jiao, Linjun Yang, and Daxin

Jiang. 2023. Lexmae: Lexicon-bottlenecked pretrain-

ing for large-scale retrieval.

Congzheng Song and Ananth Raghunathan. 2020. Infor-

mation leakage in embedding models. Proceedings

of the 2020 ACM SIGSAC Conference on Computer

and Communications Security.

Congzheng Song, Alexander M. Rush, and Vitaly

Shmatikov. 2020. Adversarial semantic collisions.

Piotr Teterwak, Chiyuan Zhang, Dilip Krishnan, and

Michael C. Mozer. 2021. Understanding invariance

via feedforward inversion of discriminatively trained

classifiers.

Nandan Thakur, Nils Reimers, Andreas Rücklé, Ab-

hishek Srivastava, and Iryna Gurevych. 2021. Beir:

A heterogenous benchmark for zero-shot evaluation

of information retrieval models.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz

Kaiser, and Illia Polosukhin. 2017. Attention is all

you need.

Vdaas. 2023. Vdaas/vald: Vald. a highly scalable dis-

tributed vector search engine.

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao,

Linjun Yang, Daxin Jiang, Rangan Majumder, and

Furu Wei. 2023. Simlm: Pre-training with represen-

tation bottleneck for dense passage retrieval.

Weaviate. 2023. Weaviate - vector database.

Sean Welleck, Ximing Lu, Peter West, Faeze Brah-

man, Tianxiao Shen, Daniel Khashabi, and Yejin

Choi. 2022. Generating sequences by learning to

self-correct.

Shitao Xiao, Zheng Liu, Yingxia Shao, and Zhao Cao.

2022. Retromae: Pre-training retrieval-oriented lan-

guage models via masked auto-encoder.

Kevin Yang and Dan Klein. 2021. FUDGE: Controlled

text generation with future discriminators. In Pro-

ceedings of the 2021 Conference of the North Amer-

ican Chapter of the Association for Computational

Linguistics: Human Language Technologies. Associ-

ation for Computational Linguistics.

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak

Shafran, Karthik Narasimhan, and Yuan Cao. 2023.

React: Synergizing reasoning and acting in language

models.

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2020.

idlg: Improved deep leakage from gradients.

Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep

leakage from gradients.

Appendix

A.1

Additional analysis

How does word frequency affect model correct-

ness? fig. 6 shows the number of correct predic-

tions (orange) and incorrect predictions (blue) for

ground-truth words, plotted across word frequency

in the training data. Our model generally predicts

words better that are more frequent in the training

data, although it is still able to predict correctly a

number of words that were not seen during train-

ing 3 . Peaks between 10 4 and 10 5 come from the

characters (, −, and ), which appear frequently in

the training data, but are still often guessed incor-

rectly in the reconstructions.

We hypothesize this is because all test tokens were present

in the training data, and the model is able to reconstruct unseen

words from seen tokens.Correctness vs word frequency in training on ArXiv

False negative

True positive

8000

6000

4000

2000

10 0

10 1

10 4

10 5

10 2

10 3

Frequency in training data

10 6

10 7

Figure 6: Correctness on evaluation samples from ArXiv

data.

A.2

Full defense results

Results on each dataset from BEIR under vary-

ing levels of Gaussian noise are shown in Ap-

pendix A.2. The model is GTR-base. Note that

the inputs are limited to 32tokens, far shorter than

the average length for some corpuses, which is why

baseline (λ = 0) NDCG@10 numbers are lower

than typically reported. We included the full results

(visualized in Figure 2) as Appendix A.2.λ

0.001

0.01

0.1

1.0

arguana bioasq climate-fever dbpedia-entity fiqa msmarco nfcorpus nq quora robust04 scidocs scifact signal1m trec-covid trec-news webis-touche2020

0.328

0.329

0.324

0.005

0.001 0.115

0.115

0.113

0.000

0.000 0.136

0.135

0.132

0.000

0.000 0.306

0.307

0.301

0.000

0.000 0.208

0.208

0.205

0.000

0.000 0.647

0.647

0.633

0.000

0.000 0.239

0.239

0.234

0.017

0.008 0.306

0.306

0.298

0.000

0.000 0.879

0.879

0.875

0.003

0.000 0.205

0.204

0.192

0.000

0.000 0.095

0.096

0.092

0.002

0.000 0.247

0.246

0.235

0.006

0.001 0.261

0.261

0.259

0.001

0.000 0.376

0.381

0.378

0.005

0.000 0.245

0.246

0.234

0.001

0.000 0.233

0.233

0.225

0.000

Table 6: BEIR performance (NDCG@10) for GTR-base at varying levels of noise (32 tokens).

0.000

0.001

0.010

0.100

1.000

NDCG@10 BLEU

0.302

0.296

0.002

0.001 80.372

72.347

10.334

0.148

0.080

Table 7: Retrieval performance and reconstruction performance across varying noise levels λ.