Summary of Embarrassingly Simple Text Watermarks A Solution

Summary Embarrassingly Simple Text Watermarks A Solution arxiv.org

9,146 words - PDF document - View PDF document

One Line

Kyoto University and the Okinawa Institute of Science and Technology introduce Easymark, a superior text watermark that effectively differentiates between human-written and machine-generated texts.

Slides

Slide Presentation (11 slides)

Copy slides outline Copy embed code Download as Word

Easymark: The Solution to Distinguishing Human-Written and Machine-Generated Texts

Source: arxiv.org - PDF - 9,146 words - view

The Challenge of Text Watermarking

• Rise of Large Language Models (LLMs) makes it difficult to differentiate between human-written and machine-generated texts

• Risk of abuse and credibility issues

• Need for an effective text watermarking solution

Introducing Easymark

• Family of simple yet effective text watermarks

• Injects watermark without altering the meaning of the text

• Allows detection of texts generated from systems using Easymark

Easy Implementation and Accessibility

• Requires only a few lines of code for implementation

• No need for access to LLMs

• Suitable for user-side implementation when LLM providers do not offer watermarked LLMs

Enhanced Detection Accuracy and BLEU Scores

• Outperforms existing text watermarking methods in detection accuracy

• Achieves higher BLEU scores

• Provides reliable detection without degrading text quality

The Impossibility Theorem of Perfect Watermarking

• No watermark can be constructed that cannot be erased without degrading text quality

• Demonstrates the limitations of more complex watermarking methods

• Supports the use of simple watermarks like Easymark

Variants for Different Text Types

• Whitemark: Replaces whitespaces with different codepoints, preserving text appearance

• Variantmark: Utilizes variation selectors of Unicode for CJK texts

• Printmark: Uses ligatures and variations in whitespace lengths for printed texts

Trade-off Between Detection Accuracy and Erasing Difficulty

• Designing watermarks requires balancing detection and erasing challenges

• Easymark provides a practical compromise solution

Future Research Directions

• Exploring tasks with multimodal answers for watermarking methods

• Designing watermarks that change the meaning of the text while remaining reasonable answers to prompts

Conclusion

• Easymark offers a simple yet effective solution to distinguish human-written and machine-generated texts

• Easy to implement, with high detection accuracy and BLEU scores

• Provides reliable watermarking without compromising text quality

Embrace Easymark for Secure Text Watermarking

• Easymark: Simple, effective, and accessible

• Ensure credibility and detect machine-generated texts

• Implement Easymark for enhanced security and reliability

Key Points

Researchers propose a family of simple text watermarks called Easymark to distinguish between human-written and machine-generated texts.
Easymark injects a watermark into the text without altering its meaning, making it easy to detect if a text was generated from a system that adopted Easymark.
Easymark is easy to implement, requiring only a few lines of code, and does not require access to Large Language Models (LLMs).
Easymark achieves higher detection accuracy and BLEU scores compared to existing text watermarking methods.
The researchers prove an impossibility theorem of perfect watermarking, showing that it is impossible to construct a watermark that cannot be erased without degrading the quality of the text.
Easymark offers variants for printed and CJK texts, extending its applicability.
The researchers highlight the trade-off between detection accuracy and erasing difficulty when designing watermarks and suggest exploring tasks with multimodal answers for future research.

Summaries

25 word summary

Kyoto University and the Okinawa Institute of Science and Technology present Easymark, a text watermark to distinguish human-written and machine-generated texts. Easymark outperforms existing methods.

69 word summary

Kyoto University and the Okinawa Institute of Science and Technology propose Easymark, a simple text watermark to differentiate human-written and machine-generated texts. Easymark injects a watermark without changing the text's meaning, allowing validation. It outperforms existing methods in accuracy and BLEU scores. An impossibility theorem shows no matter how sophisticated a watermark is, it can be removed. Easymark variants for printed and CJK texts are introduced, expanding its use.

123 word summary

Researchers from Kyoto University and the Okinawa Institute of Science and Technology have proposed Easymark, a family of simple text watermarks, to differentiate between human-written and machine-generated texts. Easymark injects a watermark into the text without altering its meaning, allowing validators to determine if a text was generated from a system that adopted Easymark or not. It outperforms existing methods in detection accuracy and BLEU scores. The researchers establish an impossibility theorem, showing that no matter how sophisticated a watermark is, a malicious user could remove it. Experiments confirm that Easymark can be reliably detected without degrading scores. Variants of Easymark for printed and CJK texts are introduced, expanding its applicability. The researchers encourage the use of Easymark as a practical compromise solution.

430 word summary

Researchers from Kyoto University and the Okinawa Institute of Science and Technology have proposed a family of simple text watermarks called Easymark to address the challenge of differentiating between human-written and machine-generated texts. With the rise of Large Language Models (LLMs), this task has become increasingly difficult. Easymark injects a watermark into the text without altering its meaning, allowing validators to determine if a text was generated from a system that adopted Easymark or not.

Easymark is easy to implement, requiring only a few lines of code, and does not require access to LLMs. It outperforms existing text watermarking methods in terms of detection accuracy and BLEU scores. The researchers also establish an impossibility theorem, demonstrating that no matter how sophisticated a watermark is, a malicious user could remove it. This motivates the use of simple watermarks like Easymark.

Experiments conducted with LLM-generated texts confirm that Easymark can be reliably detected without degrading BLEU and perplexity scores. The researchers provide a demonstration of Easymark on their website and encourage readers to try it. Their contributions include proposing Easymark as a family of simple text watermarking methods that maintain text quality and ease of implementation. They also introduce variants of Easymark for printed and CJK texts, expanding its applicability.

The first variant, Whitemark, replaces whitespaces with different codepoints while preserving the appearance of the text. It can be detected by counting the number of specific codepoints in the text and is robust to manual edits. Whitemark is suitable for real-time applications and can also be used for steganography by embedding secret messages in whitespace replacements.

For CJK texts, the researchers propose Variantmark, which utilizes variation selectors of Unicode to create alternating patterns that serve as reliable watermarks.

To address printed texts, the researchers introduce Printmark, which uses ligatures and variations in whitespace lengths to create difficult-to-detect watermarks.

The researchers also prove the impossibility of constructing a perfect watermark that cannot be erased without degrading text quality. They suggest exploring tasks with multimodal answers and designing watermarks that change the meaning of the text as promising directions for future research.

In conclusion, Easymark offers a simple yet effective solution to distinguishing between human-written and machine-generated texts. It is easy to implement, achieves high detection accuracy, and does not compromise text quality. The researchers encourage its use by practitioners and researchers.

548 word summary

Researchers from Kyoto University and the Okinawa Institute of Science and Technology have proposed a family of simple yet effective text watermarks called Easymark. Easymark addresses the problem of distinguishing between human-written and machine-generated texts, which has become increasingly difficult with the rise of Large Language Models (LLMs). The researchers inject a watermark into the text without altering its meaning, allowing validators to detect if a text was generated from a system that adopted Easymark or not.

Easymark is extremely easy to implement, requiring only a few lines of code, and does not require access to LLMs. It achieves higher detection accuracy and BLEU scores than existing text watermarking methods. The researchers also prove an impossibility theorem of perfect watermarking, demonstrating that no matter how sophisticated a watermark is, a malicious user could remove it from the text. This motivates the use of simple watermarks like Easymark.

The researchers conducted experiments with LLM-generated texts and confirmed that Easymark can be reliably detected without degrading BLEU and perplexity scores. They also provide a demonstration of Easymark on their website and encourage readers to try it. The contributions of their paper include the proposal of Easymark as a family of simple text watermarking methods that exploit different Unicode codepoints while maintaining text quality and ease of implementation. They also introduce variants of Easymark for printed and CJK texts, extending its applicability.

The first variant, Whitemark, replaces whitespaces with different codepoints while preserving the appearance of the text. Whitemark does not change the meaning of the text and can be detected by counting the number of specific codepoints in the text. It is robust to manual edits and can be implemented in a streaming manner, making it suitable for real-time applications. Whitemark can also be used as a steganography method by embedding a secret message in the choice of whitespace replacements.

For CJK texts, the researchers propose Variantmark, which utilizes the variation selectors of Unicode. By replacing every other occurrence of a Chinese character with one that has a variation selector, Variantmark creates alternating patterns that can be reliably detected as watermarks.

To address printed texts, the researchers introduce Printmark, which utilizes ligatures and variations in whitespace lengths. Replacing every other occurrence of a ligature with that with ligature creates a watermark that is difficult to detect visually. Similarly, replacing whitespaces with slightly different lengths creates a watermark that can be detected by analyzing the pattern of whitespaces.

The researchers also prove an impossibility theorem of perfect watermarking, showing that it is impossible to construct a watermark that cannot be erased without degrading the quality of the text. They suggest exploring tasks with multimodal answers and designing watermarks that change the meaning of the text while remaining reasonable answers to prompts as promising directions for future research.

In conclusion, Easymark offers a simple yet effective solution to the problem of distinguishing between human-written and machine-generated texts. It can be easily implemented and achieves high detection accuracy without compromising text quality. The researchers encourage its use by practitioners and researchers.

838 word summary

Researchers from Kyoto University and the Okinawa Institute of Science and Technology have proposed a family of simple yet effective text watermarks called Easymark. With the rise of Large Language Models (LLMs), it has become increasingly difficult to distinguish between human-written and machine-generated texts, posing a risk of abuse and credibility issues. Easymark addresses this problem by injecting a watermark into the text without altering its meaning, allowing validators to detect if a text was generated from a system that adopted Easymark or not.

Easymark is extremely easy to implement, requiring only a few lines of code, and does not require access to LLMs, making it suitable for user-side implementation when LLM providers do not offer watermarked LLMs. Despite its simplicity, Easymark achieves higher detection accuracy and BLEU scores than existing text watermarking methods. The researchers also prove an impossibility theorem of perfect watermarking, demonstrating that no matter how sophisticated a watermark is, a malicious user could remove it from the text. This motivates the use of simple watermarks like Easymark.

For CJK texts, the researchers propose Variantmark, which utilizes the variation selectors of Unicode. Some Chinese characters have variations with the same meaning, and Unicode allows for specifying the variation through special codepoints. By replacing every other occurrence of a Chinese character with one that has a variation selector, Variantmark creates alternating patterns that can be reliably detected as watermarks.

To address printed texts, the researchers introduce Printmark, which utilizes ligatures and variations in whitespace lengths. Ligatures are specific combinations of characters that can be represented by a single codepoint, and replacing every other occurrence of a ligature with that with ligature creates a watermark that is difficult to detect visually. Similarly, replacing whitespaces with slightly different lengths creates a watermark that can be detected by analyzing the pattern of whitespaces.

While Whitemark has a limitation that it can be bypassed by replacing all whitespaces with the basic whitespace codepoint, the researchers argue that this limitation is not significant in practice. Most users are not familiar with Unicode specifications and would not notice or intentionally remove the watermark. Additionally, the ability to erase watermarks is a universal problem in watermarking methods, including more complex ones. Therefore, Easymark provides a practical compromise solution that balances simplicity and effectiveness.

The researchers also prove an impossibility theorem of perfect watermarking, showing that it is impossible to construct a watermark that cannot be erased without degrading the quality of the text. They demonstrate this through a counterexample and explain the implications for watermarking methods. They suggest that exploring tasks with multimodal answers and designing watermarks that change the meaning of the text while remaining reasonable answers to prompts could be promising directions for future research.

In conclusion, Easymark offers a simple yet effective solution to the problem of distinguishing between human-written and machine-generated texts. It can be easily implemented

The document discusses the importance of text watermarking and presents a solution called Easymark, which is a family of simple watermarking methods. The authors highlight the trade-off between detection accuracy and erasing difficulty when designing watermarks. They emphasize the value of Theorem 3.4, which provides guidance on designing theoretically sound watermarks. The authors conducted experiments to compare Easymark with other watermarking methods in terms of BLEU scores, detection accuracy, and text quality. They found that Easymark consistently outperformed other methods and could add a watermark without degrading the text quality. The authors also discuss related work in the field of detecting machine-generated text and highlight the limitations of blackbox detection methods. They explain that whitebox detection methods, such as watermarking, can be more reliable but may harm the quality of the text. However, Easymark overcomes this limitation by providing reliable watermarking without compromising text quality. The authors provide theoretical justifications for Easymark and discuss its practical advantages for users. They also mention the use of Unicode encoding for watermarking and compare their work to existing Unicode-based watermarks. The authors conclude by stating that Easymark is a simple yet strong baseline for watermarking methods and encourage its use by practitioners and researchers.

Raw indexed text (54,290 chars / 9,146 words / 968 lines)

Embarrassingly Simple Text Watermarks

Ryoma Sato

[email protected]

Kyoto University

Okinawa Institute of Science and Technology

Yuki Takezawa

[email protected]

Kyoto University

Okinawa Institute of Science and Technology

Han Bao

[email protected]

Kyoto University

Okinawa Institute of Science and Technology

Kenta Niwa

[email protected]

NTT Communication Science Laboratories

Makoto Yamada

[email protected]

Okinawa Institute of Science and Technology

Abstract

We propose Easymark, a family of embarrassingly simple yet effective watermarks. Text

watermarking is becoming increasingly important with the advent of Large Language Models

(LLM). LLMs can generate texts that cannot be distinguished from human-written texts.

This is a serious problem for the credibility of the text. Easymark is a simple yet effective

solution to this problem. Easymark can inject a watermark without changing the meaning

of the text at all while a validator can detect if a text was generated from a system that

adopted Easymark or not with high credibility. Easymark is extremely easy to implement

so that it only requires a few lines of code. Easymark does not require access to LLMs, so

it can be implemented on the user-side when the LLM providers do not offer watermarked

LLMs. In spite of its simplicity, it achieves higher detection accuracy and BLEU scores than

the state-of-the-art text watermarking methods. We also prove the impossibility theorem

of perfect watermarking, which is valuable in its own right. This theorem shows that no

matter how sophisticated a watermark is, a malicious user could remove it from the text,

which motivate us to use a simple watermark such as Easymark. We carry out experiments

with LLM-generated texts and confirm that Easymark can be detected reliably without

any degradation of BLEU and perplexity, and outperform state-of-the-art watermarks in

terms of both quality and reliability.

Introduction

With the advent of large language models (LLMs) (Brown et al., 2020; OpenAI, 2023), text watermarking is

becoming increasingly important (Kirchenbauer et al., 2023a; Zhao et al., 2023a; Abdelnabi & Fritz, 2021).

The quality of texts generated by LLMs is so high that it is difficult to distinguish them from human-written

texts (Clark et al., 2021; Jakesch et al., 2023; Sadasivan et al., 2023). This increases the risk of abuse of

automatically generated text. For example, a malicious user can generate a fake news article and spread

it on social media. Some users may automatically generate large numbers of blog posts and try to earn

advertising fees. Some students may use LLMs to generate essays and submit them to their teachers. In

order to prevent such abuses, it is important to be able to detect automatically generated texts.

1Kirchenbauer et al. (2023a) proposed a method to have LLMs generate watermarked text. A basic idea is

to split a vocabulary into red and green words. LLMs are forced to generate many green words. A validator

can detect if a text was generated from a system that adopted this method or not by checking the ratio of

green words. If a text contains too many green words to be generated by a human, the text is considered

to be generated by an LLM. This method is effective in detecting automatically generated texts. Takezawa

et al. (2023) elaborated this idea by precisely controlling the number of green words.

However, adding these watermarks harms the quality of the generated texts because the LLMs are forced to

generate less diverse texts to include a sufficient number of green words. Specifically, it has been observed

that the BLEU score and perplexity are degraded when the watermarks are added (Takezawa et al., 2023).

This fact hinders LLM vendors such as OpenAI and Microsoft from adopting text watermarks because the

quality greatly affects user experience. Besides, incorporating these watermarks requires a lot of engineering

effort, which makes practitioners all the more hesitate to adopt them. Worse, these watermarks require

steering the LLM. Therefore, users cannot enjoy watermarked LLMs until the LLM providers adopt them.

This is a serious problem because LLM providers may not adopt them for the above reasons.

To overcome these problems, we propose Easymark, a family of embarrassingly simple text watermarking

methods that exploit the specifications of character codes. Easymark perfectly resolves the above concerns.

First, Easymark does not degrade the quality of texts at all. The watermarked text looks the same as the

original text. The degradation of BLEU and perplexity is literally zero. Second, Easymark is extremely

easy to implement with only a few lines of code. Easymark is a plug-and-play module, and one does

not need to modify the decoding program, while the methods proposed by Kirchenbauer et al. (2023a) and

Takezawa et al. (2023) need to modify the decoding algorithms. Therefore, Easymark can be implemented

on the user’s side. For example, Easymark can be installed on highschool computers as a browser add-on,

and teachers can use watermarks even if LLM providers do not implement it. Note that Hacker et al. (2023)

called for “markings that are easy to use and recognize, but hard to remove by average users,” Grinbaum &

Adomaitis (2022) called for “unintrusive, yet easily accessible marks of the machine origin”, and Easymark

meets these requirements. These advantages make Easymark a practical method of text watermarking.

Easymark is also a good starter before adopting more sophisticated methods. We also note that the

approach of Easymark is orthogonal to the existing LLM watermarks, and can be combined with them to

reinforce the reliability of the text watermarking.

We carry out experiments with LLM-generated texts and we confirm that Easymark can be detected

reliably without any degradation of BLEU and perplexity.

You can try Easymark at https://easymarkdemo.github.io/ in a few seconds. We encourage the readers

to try it.

The contributions of this paper are as follows:

• We propose Easymark, a family of embarrassingly simple text watermarking methods that exploit

different Unicode codepoints that have the same meanings. Easymark does not degrade the quality

of texts at all and is extremely easy to implement.

• Easymark includes variants for printed and CJK texts, which extend the applicability of Easy-

mark.

• We prove the impossibility theorem of perfect watermarking, which is valuable in its own right. This

theorem shows any reliable watermark, including an elaborated one, can be removed by a malcious

user. This theorem motivates us to use simple watermarks like Easymark because we cannot avoid

the vulnerability with even elaborated watermarks.

• We carry out experiments with LLM-generated texts and we confirm that Easymark can be detected

reliably without any degradation of BLEU and perplexity.

2Figure 1: A screenshot of Whitemark on Jupyter noteook. [2]: The implementation of the Whitemark

algorithm. [3] [4]: The original text and watermarked text look the same. [5] [6]: The watermarked text

can be detected by the detect_watermark function. [7] [8]: The Sacrebleu library identifies the original

and watermarked texts.

Problem Setting

The goal is to design two functions, add_watermark and detect_watermark. add_watermark takes a raw

text as input and returns a watermarked text. detect_watermark takes a text as input and returns a

boolean value indicating whether the text was generated from a system that adopted add_watermark or not.

The requirements for these functions are (i) add_watermark should not change the meaning of the text, and

(ii) detect_watermark should be able to detect the watermark with high credibility. LLM providers can

incorporate our method as a plug-and-play module that is passed through after an LLM generates texts.

We do not pose any assumptions on the input texts to be watermarked, while some existing methods can

only add watermarks to LLM-generated texts (Kirchenbauer et al., 2023a; Takezawa et al., 2023). Therefore,

our problem setting is more general than the existing ones. For example, a human news writer can use our

method to add watermarks to their articles.

Proposed Method

We propose a family of easy text watermarks, Easymark, which exploits the fact that Unicode has many

codepoints with the same or similar appearances. Easymark has three variants, Whitemark, Variant-

mark, and Printmark, which are suitable for different scenarios. Whitemark is the easiest one and

is suitable for digital texts. Variantmark is suitable for texts with Chinese characters. Printmark is

suitable for printed texts. We will explain each variant in detail in the following subsections.

3.1

Embarassingly Easy Watermark (Whitemark)

def add_watermark ( t ) :

return t . replace ( chr (0 x0020 ) , chr (0 x2004 ) )

def detect_watermark ( s ) :

return chr (0 x2004 ) in s

Listing 1: A function to add a watermark and detect it.

3Whitemark is the simplest method that exploits the fact that Unicode has many codepoints for whitespace

and replaces a whitespace (U+0020) with another codepoint of a whitespace, e.g., U+2004. The existence

of Whitemark can be detected by counting the number of U+2004 in the texts. Whitemark does not

change the meaning of a text at all. Listing 1 shows the Python code of Whitemark. An example of

execution is shown in Figure 1. The appearance of a text does not change with Whitemark. The Sacrebleu

library (Post, 2018) and SentencePiece library (Kudo & Richardson, 2018) identify the raw text and the

text watermarked by Whitemark. These observations indicate Whitemark does not change the contents

at all. Nevertheless, the detect_watermark succeeds in detecting the watermark. Note that Whitemark

does not disappear by electrical copy and paste.

Whitemark has the following preferable properties:

Proposition 3.1. The BLEU scores and perplexity of the raw text and the text with Whitemark are the

same.

Proof. As Whitemark does not change any printable characters, this does not affect the BLEU score and

perplexity.

Proposition 3.2. If the original text has a whitespace (U+0020) and does not have any U+2004, the

text with Whitemark can be detected with 100% accuracy. More precisely, let p nat (x) be the probability

distribution of natural texts and let p in (x) be the probability distribution of a text x to be input to Whitemark.

Let

def

δ FP = 1 − Pr x∼p nat (x) [x does not contain U+2004],

def

δ FN = 1 − Pr x∼p in (x) [x contains a whitespace].

(1)

(2)

Then,

Pr x∼p nat (x) [detect_watermark(x) = False] ≥ 1 − δ FP , (3)

Pr x∼p in (x) [detect_watermark(add_watermark(x)) = True] ≥ 1 − δ FN . (4)

Proof. Suppose x does not contain U+2004. This event holds with probability 1 − δ FP under p nat (x). As x

does not contain U+2004, detect_watermark(x) returns False under this condition. Thus, Eq. 3 holds.

Suppose x contains a whitespace. This event holds with probability 1 − δ FN under p in (x). Un-

der this condition, add_watermark(x) contains U+2004 because x contains a whitespace. Therefore,

detect_watermark(add_watermark(x)) returns True. Thus, Eq. 4 holds.

As δ FP and δ FN are extremely low in real-world scenarios, Proposition 3.2 indicates Whitemark can be

detected almost perfectly. We confirmed that δ FP = 0 held in the WMT-14 dataset. We also confirmed

that δ FN = 0 held in the 100 texts generated by GPT-3.5, NLLB-200 (Costa-jussà et al., 2022), and LLaMA

(Touvron et al., 2023). More experimental results can be found in Section 4.

Besides, Whitemark is robust to manual edit. Whitemark can be detected even after the text is edited

unless all of the whitespaces are edited, which usually happens when a user rewrites the entire text, in which

case we do not need to claim the text is generated from an LLM. We will return to this point in Section 3.5.

Another preferable property of Whitemark is that it can be implemented in a streaming manner. Many

LLM applications require real-time generation of texts. For example, chatbots like ChatGPT respond to

a user’s message as soon as tokens are generated. Elaborated watermark algorithms such as the dynamic

programming of NS-Watermark (Takezawa et al., 2023) cannot inject a watermark into a text in a streaming

manner because they require the entire text to be generated before adding a watermark. By contrast,

Whitemark is suitable for real-time applications.

Last but not least, Whitemark can be implemented on the user-side (Sato, 2022a;d) because it does not

require access to the LLM while the existing methods require the LLM provider to implement them. The

LLM provider may hesitate to adopt the existing methods because they require a lot of engineering effort

4Code point

Name

character tabulation

space

no-break space*

en quad

em quad

en space

em space

three-per-em space

four-per-em space

six-per-em space

figure space*

U+0009

U+0020

U+00A0

U+2000

U+2001

U+2002

U+2003

U+2004

U+2005

U+2006

U+2007

Glyph

Name Code point

punctuation space

thin space

hair space

narrow no-break space*

medium mathematical space

ideographic space

zero width space

zero width non-joiner

zero width joiner

word joiner*

zero width non-breaking space* U+2008

U+2009

U+200A

U+202F

U+205F

U+3000

U+200B

U+200C

U+200D

U+2060

U+FEFF

Glyph

Figure 2: A list of whitespaces. The red bars indicate the lengths of the whitespaces. These codepoints can

be used for steganography and watermarks. *: no-break space.

Algorithm 1: Steganography Encode

Input: A text x in a Unicode sequence; A secret message m ∈ Z ≥0 ; A list of codepoints [u 0 , . . . , u p−1 ].

Output: A text that is embedded the secret message in a Unicode sequence.

m 1 , . . . , m k ← the p-ary form of m

i ← 1

for c in x do

if c is a whitespace then

c ← u m i

i ← i +1

if i > k then

break

10 if i ≤ k then

raise ValueError

11 return x

// m i ∈ {0, 1, . . . , p − 1}

// The index of m

// Replace a whitespace with a codepoint in C

// Increment the index

// Finish if the secret message is embedded

// Raise an error if the secret message is too long

and may degrade the quality of the texts. In that case, what users can do is to wait for the LLM provider to

adopt the methods. By contrast, users can introduce Whitemark on their own. For example, highschool

teachers can install Whitemark as a browser add-on and use it to detect automatically generated essays

even if the LLM provider does not implement it.

3.2

Steganography

Whitemark can be used as a steganography method. A user can embed a secret message in a text by

choosing the places of whitespaces to be replaced. For example, if the secret message is 1101001 in binary,

the first, second, fourth, and seventh whitespaces are replaced with U+2004. It should be noted that secret

messages can be encrypted by standard encrypted methods such as AES and then fed to steganography so

that the secret message is not read by adversaries. Secret messages can also be encoded by an error correction

code and then fed to steganography to make the steganography more robust to edit. For example, suppose

the message a user wants to embed is

x = 10100100001.

(5)

Encode it with an error correction code and suppose the encoded message is

x’ = 001001000100001.

(6)Algorithm 2: Steganography Decode

Input: A text x that is embedded a secret message in a Unicode sequence; A list of codepoints

[u 0 , . . . , u p−1 ].

Output: The secret message m ∈ Z ≥0 .

5 m ← 0

for c in x do

for i = 0, . . . , p − 1 do

if c = u i then

m ← p × m + i

6 return m

(a)

// Initialize the secret message

// Update the secret message

(b)

Sequence of code points

Glyph

U+9BD6

U+9BD6, U+E0100

U+9BD6, U+E0101

U+9BD6, U+E0102

U+9BD6, U+E0103

Figure 3: Variant characters and variation selectors. Both (a) and (b) have the meaning of mackerel.

Although these characters are exchangeable in most scenarios, Unicode supports distinguishing them by

variation selectors for some special purposes. The right table shows the list of Unicode sequences for the

characters. These variations can be used for watermarking.

Replace the whitespaces with U+2004 based on the non-zero indices of x’. Even if someone edits the text

and the whitespace pattern is changed to

x” = 001001010100001, (7)

x = 10100100001 (8)

the validator can confirm

by decoding x” with the error correction code. The steganography technique also makes Whitemark more

robust as a watermarking method. Each user can choose a different pattern and detect the watermark with

more reliability.

This idea can easily be extended to other codepoints. Figure 2 lists codepoints for whitespaces. Each user

can select a different set of codepoints for a watermark, and embed a secret message in a p-ary form, where

p is the number of codepoints in the set. Algorithms 1 and 2 show the pseudo-code of the steganography.

3.3

Watermarks for CJK Texts (Variantmark)

Although European languages including English entail many whitespaces as delimiters, CJK languages such

as Chinese and Japanese have few whitespaces. Therefore, Whitemark cannot be directly applied to CJK

texts. We propose Variantmark for CJK texts. The idea is to use the variation selectors of Unicode.

Some Chinese characters have variations with the same meaning, and Unicode supports specifying the

variation by special code points. Figure 3 shows an example of such characters. Figure 3 (a) is the basic

character for mackerel and has the codepoint U+9BD6. Figure 3 (b) is a variant character. Unicode supports

distinguishing these characters by variation selectors. The right table shows the list of Unicode sequences

for the characters. These variations can be used for watermarking. A user can embed a secret message by

6Character with a variational selector.

Character without a variational selector.

(a) Watermarked text

(b) Edited text

Consecutive two Chinese characters

that have and do not have a variational selector.

Unedited segments have many such couples.

Figure 4: An illustration of Variantmark. (a) A watermarked text has alternating patterns. A red circle

indicates a character with a variational character, and a blue circle indicates a character without a variational

character. (b) The patterns remain even if the text is edited to some extent.

Figure 5: A screenshot of Variantmark for Japanese texts. Although the original text and watermarked

text look the same, the Unicode sequences are different.

choosing the variation selectors. The important fact is that Figure 3 (a) can also be specified by variation

selectors, as well as the single code point. Specifically, Figure 3 (a) can be represented by [U+9BD6,

U+E0101] and [U+9BD6, U+E0103] as well as [U+9BD6]. Similarly, other Chinese characters can be

represented in at least two ways keeping their appearances. These choices are different as Unicode sequences,

but their appearances are the same. Variantmark replaces U+9BD6 with [U+9BD6, U+E0101], and the

validator can detect the watermark by checking the variation selectors. More specifically, Variantmark

replaces every other occurrence of a Chinese character with one with a variation selector. The validator can

detect the watermark by counting the number of consecutive two Chinese characters that have and do not

have a variational selector (Figure 4). Note that we do not replace all of the Chinese characters but every

other occurrence because natural texts have some variational selectors, and false positive could happen if

Variantmark adopted the same strategy as Whitemark. Alternating patterns do not appear naturally,

so the watermarks can be detected robustly. Variantmark can also be used for steganography. As natural

texts have some variational selectors, secret messages should be encoded with an error correction code in the

preprocessing.

Figure 5 shows an example of execution. The original text and watermarked text look the same, but the

Unicode sequences are different. The validator can detect the watermark by checking the variation selectors.

3.4

Watermarks for Printed Texts (Printmark)

One of the limitations of Whitemark is that it disappears when a texts is printed because the absolute

lengths of U+2004 depend on the font and texts with U+2004 look normal once printed. Therefore, if an

essay is printed, the teacher cannot detect Whitemark.

We propose several methods, which we call Printmark, to cope with printed texts. The first idea is to use

ligature (Figure 6). Printmark replaces every other occurrence of a substring that can be expressed as

ligature with that with ligature. We do not replace all of the substrings but every other occurrence because

7(a)

(b)

U+0066, U+0066

U+FB00

Figure 6: An example of ligature. (a) ff without ligature (b) ff with ligature. We can specify ligature with

Unicode.

(a)

(b)

(c)

(d)

Figure 7: An example of execution of Printmark. We printed the texts out and scanned it to create the

above image. (a) The original text. (b) The watermarked text with ligature. The red underline indicates

ligature and the blue underline indicates non-ligature. (c) The original text. (d) The watermarked text

with three-per-em spaces. The red underlines indicate space (U+0020), and the blue underlines indicate

three-per-em spaces (U+2004).

natural texts have some ligatures, and the false positive could happen if Printmark adopted the same

strategy as Whitemark. The second idea is to use whitespaces with slightly different lengths. Printmark

replaces every other occurrence of a whitespace with a three-per-em space (U+2004). Although this slightly

changes the appearance, the changes are hardly perceptible. A validator can detect the watermark by

checking the pattern of whitespaces. The third idea is to use variant characters of Chinese characters. For

example, Printmark replaces U+9BD6 with [U+9BD6, U+E0100] (i.e., Figure 3 (a) to (b)). Although

this slightly changes the appearance, the meaning of a text does not change, and most users will not notice

anything. Figure 7 shows examples of Printmark. The appearances are hardly changed, but they are

indeed changed, and the validator can detect the watermark.

3.5

Limitation

The critical limitation of Whitemark is that it can be bypassed by replacing all whitespaces with the basic

whitespace U+0020, then the validator can no longer detect the watermark. We argue that this limitation

does not undermine the value of Whitemark.

First, most of end users are not familiar with the specifications of Unicode. For example, high school students

do not know the meaning of different code points and hardly notice nor cope with the difference between

U+0020 and U+2004. The does not need to achive the perfect recall. Once some student is caught by a

teacher for writing an essay using an LLM, other students would refrain from taking the risk of using LLMs

for their essays.

Second, the false negative is a universal problem of watermarking, as we will formally show in the next

subsection. Any watermarking method has this drawback. Therefore, the criticism that the watermark can

be erased is not specific to Easymark but is valid for all watermarking methods, including elaborated ones.

8As this is inevitable in principle, practitioners will all the more hesitate to adopt complicated methods, and

Easymark is a practical compromise solution.

3.6

Impossibility Theorem

We show that it is impossible to construct a perfect watermark.

Theorem 3.3 (Impossibility Theorem, Informal). There exists a universal erasing function that erases any

reliable watermark without much degradation of the quality of the text.

The theorem is formally stated as follows.

Theorem 3.4 (Impossibility Theorem, Formal). Let (X , d X ) be a metric space of texts. Let C be the random

variable that indicates a condition (i.e., prompt). Let X = f (C) be the text generated by an LLM given the

condition C. Let g : C ×K → X be any watermarking function and X k = g(C, k) be the text with a watermark

with key or random seed k, where C is the space of conditions, and K is the space of keys. Suppose

E[L(X k , C) − L(X, C)] ≤ ε

(9)

holds, where L : X ×C → R is a loss function that is 1-Lipschitz continuous with respect to the first argument,

and ε is a positive number, i.e., the quality of the text is not degraded much with the watermark. Let

Detect : X × K → {True, False} be any function such that

Pr[Detect(X k , k) = True] ≥ 1 − δ, (10)

Pr[Detect(X, k) = False] = 1, (11)

i.e., the watermark can be detected reliably. Suppose

E[d X (X, X k )] ≤ ε ′ ,

(12)

hold, i.e., the watermark does not change the meaning of the text much. Then, there exists Erase : X → X

such that

Pr[Detect(Erase(X k ), k) = False] = 1,

(13)

E[L(Erase(X k ), C) − L(X, C)] ≤ ε + ε ′ ,

(14)

i.e., the watermark can be erased without harming the quality of the text and without knowing the key k, and

Erase is universal in the sense that it does not depend on g, k, Detect, or prompts but only on X.

The proof is available in Appendix A.

In practice, Erase(x) can be approximately simulated by translating x into French and back to English by

DeepL or Google Translate. This does not change the meaning of the text but erases the watermark.

We assume that the watermark does not change the meaning of the text much in Theorem 3.4, i.e., Eq. 12.

This is the case for most watermarking methods, including those proposed by Kirchenbauer et al. (2023a)

and Takezawa et al. (2023) and Easymark. Note that Eq. 12 is automatically met if the loss L(·, c) is

unimodal and both X and X k incur low losses because X and X k are in the same basin and close to each

other in this case. We show that this assumption is necessary for the theorem to hold in the following by

showing a counterexample.

Counterexample. We show that Theorem 3.4 does not hold if a watermark does not meet Eq. 12, i.e., it

does not care about the metric. Let X = {x 1 , x 2 , x 3 }, C = {c 1 , c 2 }, and let C follow the uniform distribution

on C. Let the loss function be

L(x 1 , c 1 ) = 0, L(x 2 , c 1 ) = ∞, L(x 3 , c 1 ) = 0, (15)

L(x 1 , c 2 ) = ∞, L(x 2 , c 2 ) = 0, L(x 3 , c 2 ) = 0. (16)

Let the generated texts be

f (c 1 ) = x 1 ,

f (c 2 ) = x 2 ,

g(c 1 , k) = x 3 ,

g(c 2 , k) = x 3 .

(17)Let the detection function be

Detect(x 1 , k) = False,

Detect(x 2 , k) = False,

Detect(x 3 , k) = True.

(18)

With the above conditions,

L(X, C) = 0, (19)

L(X k , C) = 0, (20)

Detect(X, k) = False, and

Detect(X k , k) = True

(21)

(22)

hold with probability one, i.e., the watermark is perfect. However, there is not a good erasing function

Erase : X → X for this watermark. If Erase(x 3 ) = x 1 , then under C = c 2 ,

L(Erase(X k ), C) = L(Erase(x 3 ), c 2 )

(23)

= L(x 1 , c 2 ) (24)

= ∞, (25)

and if Erase(x 3 ) = x 2 , then under C = c 1 ,

L(Erase(X k ), C) = L(Erase(x 3 ), c 1 )

(26)

= L(x 2 , c 1 ) (27)

= ∞, (28)

and if Erase(x 3 ) = x 3 , then

Detect(Erase(X k ), k) = Detect(Erase(x 3 ), k)

(29)

= Detect(x 3 , k) (30)

= True. (31)

Remark (Implications) Theorem 3.4 is interesting in its own right because Theorem 3.4, its assumptions,

and the above counterexample tell the promising directions of watermarking methods. As Theorem 3.4

shows, it is impossible to design a perfect watermark as long as the watermark does not change the meaning

of the text. Therefore, if one aims at designing unbreakable watermarks, the watermark should be designed

to change the meaning of the text while it should be a reasonable answer to the prompt, and one should

focus on tasks the loss of which is multimodal, like the above counterexample. For example, if there are

two reasonable but different answers x 1 and x 2 for a prompt c, and the LLM generates x 1 for c, then an

unbreakable watermark should output x 2 instead of similar texts to x 1 . This is not the case for tasks such

as summarization and translation because all correct answers to a prompt are usually similar to each other.

Exploring tasks where multimodality exists and watermarking methods that exploit the multimodality are

promising directions. Essay writing is one such a direction because there are many reasonable answers for

a single condition c, and watermarking is indeed important for essay writing. Alternatively, one can also

exploit the assumption of Theorem 3.4 that the detection function should recognize the text without a

watermark almost surely, which is necessary for the theorem to hold. We discuss it in detail in Appendix

B. This indicates that watermarks that are difficult to erase can be designed if we make a trade-off between

the detection accuracy and the erasing difficulty. As stated above, Theorem 3.4 reveals directions that are

doomed to failure and provides guidance on fruitful directions.

In summary, Theorem 3.4 is valuable in three ways. First, it motivates us to use an easy watermarking

method like Easymark. Second, it tells us not to rely too much on watermarking methods. Practitioners

should be aware of the existence of watermark erasers and should not be overconfident about the certainty

of watermarking however sophisticated the method is. Third, it provides guidance on fruitful directions to

design theoretically sound watermarks.

10Table 1: BLEU scores and detection accuracy with NLLB-200-3.3B and WMT.

En → De

De → En

BLEU ↑ FNR ↓ FPR ↓ BLEU ↑ FNR ↓ FPR ↓

w/o Watermark

Soft-Watermark (Kirchenbauer et al., 2023a)

Adaptive Soft-Watermark

NS-Watermark (Takezawa et al., 2023)

Whitemark (Ours)

36.4

5.2

20.5

32.7

36.4

BLEU ↑

w/o Watermark

Soft-Watermark (Kirchenbauer et al., 2023a)

Adaptive Soft-Watermark

NS-Watermark (Takezawa et al., 2023)

Whitemark (Ours)

42.6

9.6

23.3

38.8

42.6

n.a.

3.0%

0.0%

0.1%

En → Fr

FNR ↓

n.a.

5.4%

0.0%

n.a.

0.4%

2.6%

0.3%

0.0% 42.6

7.5

20.6

38.2

42.6

FPR ↓ BLEU ↑

n.a.

0.3%

2.2%

0.3%

0.0% 40.8

7.6

19.5

36.8

40.8

n.a.

3.3%

0.0%

Fr → En

FNR ↓

n.a.

3.6%

0.0%

n.a.

0.5%

1.9%

0.0%

FPR ↓

n.a.

0.6%

2.8%

0.1%

0.0%

Table 2: Text quality and detection accuracy with LLaMA-7B and C4 dataset.

PPL ↓ FNR ↓ FPR ↓

w/o Watermark

Soft-Watermark (Kirchenbauer et al., 2023a)

Adaptive Soft-Watermark

NS-Watermark (Takezawa et al., 2023)

Whitemark (Ours)

1.85

6.25

2.48

1.92

1.85

n.a.

2.8%

0.2%

0.0%

n.a.

0.1%

0.8%

0.3%

0.0%

Experiments

We confirm the effectiveness of our proposed method with two tasks and two LLMs. We compared our

method with the following watermarking methods with the same hyperparameter settings used by Takezawa

et al. (2023):

• The Soft-Watermark (Kirchenbauer et al., 2023a, Algorithm 2) adds biases to the logits of specific

words and detects the watermark by checking the ratio of the biased words.

• NS-Watermark (Takezawa et al., 2023) follows the same idea but precisely controls the false positive

ratio by dynamic programming.

• Adaptive Soft-Watermark (Takezawa et al., 2023) is a variant of Soft-Watermark that controls the

false positive ratio by binary search for each input.

The first task is machine translation. We used NLLB-200-3.3B (Costa-jussà et al., 2022) as the language

model and used the test dataset of WMT’14 French (Fr) ↔ English (En) (Bojar et al., 2014) and WMT’16

German (De) ↔ English (En) (Bojar et al., 2016). We report the BLEU scores of the translation results

by the NLLB model and the texts watermarked by the above and our methods. We also report the false

negative ratio (FNR) and the false positive ratio (FPR) of the detection function. The FNR is the ratio of

the texts with watermarks that are not detected, and the FPR is the ratio of the texts without watermarks

that are detected. Table 1 shows that Whitemark consistently performs better than the other methods.

Whitemark outperforms the state-of-the-art watermarking method NS-Watermark with 10 percent relative

improvements of BLEU and is more reliable in terms of detection accuracy. We emphasize again that

Whitemark is much easier to implement and deploy than NS-Watermark. Whitemark is perfect in the

11sense that the BLEU scores of the watermarked texts are the same as those of the original texts, and the

FNR and FPR are almost zero. There are only four false negative examples, each of which contains only one

word, in which case we do not need to claim that the text was generated by an LLM. This indicates that

Whitemark can add a watermark without harming the quality of the text and can detect the watermark

reliably. These results are consistent with Propositions 3.1 and 3.2.

The second task is text completion. We used LLaMA-7B (Touvron et al., 2023) as the language model and

used the subsets of C4, realnews-like dataset (Raffel et al., 2020). We followed the experimental setups used

by Kirchenbauer et al. (2023a) and Takezawa et al. (2023). Specifically, we split each text into 90 percent and

10 percent lengths and input the first into the language model. We computed the perplexity (PPL) of the

generated text by the LLaMA model and texts watermarked by the above and our methods. We computed

the perplexity by an auxiliary language model, which is regarded as an oracle to measure the quality of

the text. We also report the FNR and FPR of the detection function. Table 2 shows that Whitemark

consistently performs better than the other methods in terms of both PPL and detection accuracy.

These results are impressive because even Whitemark can achieve the almost perfect watermarking per-

formance. These results indicate that future watermarking methods should not focus only on BLEU, PPL,

FNR, and FPR, and use these metrics only for sanity checks.

Related Work

With the recent progress of LLMs, the demand for detecting whether a text is generated by an LLM is

increasing. There are two main ways of doing this. The first approach is blackbox detection (Gehrmann

et al., 2019; Uchendu et al., 2020; Gambini et al., 2022), which does not require intervention in the model.

These methods exploit the statistical tendency of texts generated by LLMs (Mitchell et al., 2023; Guo

et al., 2023). However, as the LLMs become more sophisticated, the statistical tendency becomes less

obvious, and the blackbox detection becomes less reliable (Clark et al., 2021; Jakesch et al., 2023; Schuster

et al., 2020). Another approach is whitebox detection, which requires intervention in the model, including

inference-time watermarking (Kirchenbauer et al., 2023a; Takezawa et al., 2023; Christ et al., 2023) and

post-hoc watermarking as He et al. (2022a) and Venugopal et al. (2011) and ours, retrieval-based detection

(Krishna et al., 2023), and linguistic steganography (Fang et al., 2017; Dai & Cai, 2019; Ziegler et al., 2019;

Ueoka et al., 2021). Watermarking methods are sometimes used for detecting model extraction attacks (He

et al., 2022b; Peng et al., 2023; Zhao et al., 2023b; Gu et al., 2022). Although watermarking methods are

more reliable than blackbox detection (Kirchenbauer et al., 2023b), the main drawback is that it harms the

quality of the text. Our method is reliable while it does not harm the quality of the text, as Propositions

3.1 and 3.2 and the experiments show.

Easymark can be implemented as a user-side system (Sato, 2022b;c). Sato (2022a) pointed out that “Even

if a user of the service is unsatisfied with a search engine and is eager to enjoy additional functionalities,

what he/she can do is limited. In many cases, he/she continues to use the unsatisfactory system or leaves

the service.” and proposed a user-side realization method to solve this problem. The spirit of Easymark is

the same. Even if the official LLM provider does not offer watermarking, users can use Easymark to add

a watermark to a text. This is a practical advantage of Easymark.

The idea of using Unicode encoding for watermarking is not new (Por et al., 2012; Rizzo et al., 2016; 2017).

The differences between our work and the existing Unicode-based watermarks are two-fold. First, the existing

works are not in the context of LLMs, and tackle different problems. Easymark is designed as simple as

possible so that it can be easily implemented and deployed with LLMs. Second, we provided theoretical

justifications in Propositions 3.1 and 3.2 and Theorem 3.4. These results are valuable in their own right as

we discussed in Section 3.6. It would be an interesting future direction to extend our theoretical results to

contexts other than LLMs, including those tackled in the existing Unicode-based watermarks.

Finally, Sadasivan et al. (2023) also showed the impossibility of a perfect watermark. The results of (Sada-

sivan et al., 2023) also justify the use of simple watermarks like Easymark. The difference between the

theory shown in Sadasivan et al. (2023) and ours is that we assume general loss functions and elucidate in

12which cases the watermark can be erased more precisely. Our positive and negative results can be used for

designing watermarks as discussed in Section 3.6.

Conclusion

We proposed Easymark, a family of embarrassingly easy watermarking methods. Easymark is simple and

easy to implement and deploy. Nevertheless, Easymark has preferable theoretical properties that ensure the

quality of the text and the reliability of the watermark. The simplicity and the theoretical properties make

Easymark attractive for practitioners. We also proved that it is impossible to construct a perfect watermark

and any watermark can be erased. This result is valuable in its own right because it motivates us to use an

easy watermarking method like Easymark, encourages us not to rely too much on watermarking methods,

and provides guidance on fruitful directions to design theoretically sound watermarks. We confirmed the

effectiveness of Easymark with the experiments involving LLM generated texts. Easymark outperforms

the state-of-the-art watermarking methods in terms of BLEU and perplexity and is more reliable in terms

of detection accuracy. We encourage practitioners to use Easymark as a starter for watermarking methods

and recommend LLM researchers use Easymark as a simple yet strong baseline.

Acknowledgments

Yuki Takezawa, Ryoma Sato, and Makoto Yamada were supported by JSPS KAKENHI Grant Number

23KJ1336, 21J22490, and MEXT KAKENHI Grant Number 20H04243, respectively.

References

Sahar Abdelnabi and Mario Fritz. Adversarial watermarking transformer: Towards tracing text provenance

with data hiding. In 42nd IEEE Symposium on Security and Privacy, SP, pp. 121–140, 2021.

Ondrej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling,

Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, and Ales

Tamchyna. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the Ninth

Workshop on Statistical Machine Translation, WMT@ACL, pp. 12–58, 2014.

Ondrej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck,

Antonio Jimeno-Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurélie Névéol,

Mariana L. Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco

Turchi, Karin Verspoor, and Marcos Zampieri. Findings of the 2016 conference on machine translation.

In Proceedings of the First Conference on Machine Translation, WMT, pp. 131–198, 2016.

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind

Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss,

Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens

Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack

Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Lan-

guage models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual

Conference on Neural Information Processing Systems 2020, NeurIPS, 2020.

Miranda Christ, Sam Gunn, and Or Zamir.

abs/2306.09194, 2023.

Undetectable watermarks for language models.

arXiv,

Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, and Noah A. Smith.

All that’s ’human’ is not gold: Evaluating human evaluation of generated text. In Proceedings of the

59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint

Conference on Natural Language Processing, ACL/IJCNLP, pp. 7282–7296, 2021.

Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe

Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Young-

blood, Bapi Akula, Loïc Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley

13Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil

Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán,

Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff

Wang. No language left behind: Scaling human-centered machine translation. arXiv, abs/2207.04672,

2022.

Falcon Z. Dai and Zheng Cai. Towards near-imperceptible steganographic text. In Proceedings of the 57th

Conference of the Association for Computational Linguistics, ACL, pp. 4303–4308, 2019.

Tina Fang, Martin Jaggi, and Katerina J. Argyraki. Generating steganographic text with LSTMs. In

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 100–

106. Association for Computational Linguistics, 2017.

Margherita Gambini, Tiziano Fagni, Fabrizio Falchi, and Maurizio Tesconi. On pushing deepfake tweet

detection capabilities to the limits. In WebSci ’22: 14th ACM Web Science Conference 2022, pp. 154–163,

2022.

Sebastian Gehrmann, Hendrik Strobelt, and Alexander M. Rush. GLTR: statistical detection and visual-

ization of generated text. In Proceedings of the 57th Conference of the Association for Computational

Linguistics, ACL, pp. 111–116, 2019.

Alexei Grinbaum and Laurynas Adomaitis. The ethical need for watermarks in machine-generated language.

arXiv, abs/2209.03118, 2022.

Chenxi Gu, Chengsong Huang, Xiaoqing Zheng, Kai-Wei Chang, and Cho-Jui Hsieh. Watermarking pre-

trained language models with backdooring. arXiv, abs/2210.07543, 2022.

Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, and Yupeng

Wu. How close is ChatGPT to human experts? comparison corpus, evaluation, and detection. arXiv,

abs/2301.07597, 2023.

Philipp Hacker, Andreas Engel, and Marco Mauer. Regulating ChatGPT and other large generative AI

models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency,

FAccT, pp. 1112–1123. ACM, 2023.

Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, and Chenguang Wang. Protecting intellectual prop-

erty of language generation APIs with lexical watermark. In Thirty-Sixth AAAI Conference on Artificial

Intelligence, AAAI, pp. 10758–10766, 2022a.

Xuanli He, Qiongkai Xu, Yi Zeng, Lingjuan Lyu, Fangzhao Wu, Jiwei Li, and Ruoxi Jia. CATER: intellectual

property protection on text generation APIs via conditional watermarks. In NeurIPS, 2022b.

Maurice Jakesch, Jeffrey T. Hancock, and Mor Naaman. Human heuristics for AI-generated language are

flawed. Proceedings of the National Academy of Sciences, 120(11):e2208839120, 2023.

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark

for large language models. In International Conference on Machine Learning, ICML, pp. 17061–17084,

2023a.

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando,

Aniruddha Saha, Micah Goldblum, and Tom Goldstein. On the reliability of watermarks for large language

models. arXiv, abs/2306.04634, 2023b.

Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphrasing evades

detectors of AI-generated text, but retrieval is an effective defense. arXiv, abs/2303.13408, 2023.

Taku Kudo and John Richardson. Sentencepiece: A simple and language independent subword tokenizer

and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods

in Natural Language Processing, EMNLP 2018: System Demonstrations, pp. 66–71, 2018.

14Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. DetectGPT:

Zero-shot machine-generated text detection using probability curvature. In International Conference on

Machine Learning, ICML, pp. 24950–24962, 2023.

OpenAI. GPT-4 technical report. arXiv, abs/2303.08774, 2023.

Wenjun Peng, Jingwei Yi, Fangzhao Wu, Shangxi Wu, Bin Zhu, Lingjuan Lyu, Binxing Jiao, Tong Xu,

Guangzhong Sun, and Xing Xie. Are you copying my model? protecting the copyright of large language

models for eaas via backdoor watermark. In Proceedings of the 61st Annual Meeting of the Association

for Computational Linguistics (Volume 1: Long Papers), ACL, pp. 7653–7668, 2023.

Lip Yee Por, KokSheik Wong, and Kok Onn Chee. Unispach: A text-based data hiding method using unicode

space characters. J. Syst. Softw., 85(5):1075–1082, 2012.

Matt Post. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine

Translation: Research Papers, WMT, pp. 186–191, 2018.

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou,

Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.

J. Mach. Learn. Res., 21:140:1–140:67, 2020.

Stefano Giovanni Rizzo, Flavio Bertini, and Danilo Montesi. Content-preserving text watermarking through

unicode homoglyph substitution. In Proceedings of the 20th International Database Engineering & Appli-

cations Symposium, IDEAS, pp. 97–104. ACM, 2016.

Stefano Giovanni Rizzo, Flavio Bertini, Danilo Montesi, and Carlo Stomeo. Text watermarking in social

media. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks

Analysis and Mining 2017, ASONAM, pp. 208–211. ACM, 2017.

Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, and Soheil Feizi. Can

AI-generated text be reliably detected? arXiv, abs/2303.11156, 2023.

Ryoma Sato. CLEAR: A fully user-side image search system. In Proceedings of the 31st ACM International

Conference on Information & Knowledge Management, CIKM, pp. 4970–4974, 2022a.

Ryoma Sato. Private recommender systems: How can users build their own fair recommender systems

without log data? In Proceedings of the 2022 SIAM International Conference on Data Mining, SDM, pp.

549–557, 2022b.

Ryoma Sato. Retrieving black-box optimal images from external databases. In The Fifteenth ACM Interna-

tional Conference on Web Search and Data Mining, WSDM, pp. 879–887, 2022c.

Ryoma Sato. Towards principled user-side recommender systems. In Proceedings of the 31st ACM Interna-

tional Conference on Information & Knowledge Management, CIKM, pp. 1757–1766, 2022d.

Tal Schuster, Roei Schuster, Darsh J. Shah, and Regina Barzilay. The limitations of stylometry for detecting

machine-generated fake news. Comput. Linguistics, 46(2):499–510, 2020.

Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, and Makoto Yamada. Necessary and sufficient water-

mark for large language models. arXiv, abs/2310.00833, 2023.

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix,

Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard

Grave, and Guillaume Lample. LLaMA: Open and efficient foundation language models. arXiv,

abs/2302.13971, 2023.

Adaku Uchendu, Thai Le, Kai Shu, and Dongwon Lee. Authorship attribution for neural text generation.

In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP,

pp. 8384–8395, 2020.

15Honai Ueoka, Yugo Murawaki, and Sadao Kurohashi. Frustratingly easy edit-based linguistic steganography

with a masked language model. In Proceedings of the 2021 Conference of the North American Chapter of

the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pp. 5486–

5492, 2021.

Ashish Venugopal, Jakob Uszkoreit, David Talbot, Franz Josef Och, and Juri Ganitkevitch. Watermarking

the outputs of structured prediction with an application in statistical machine translation. In Proceedings

of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 1363–1372,

2011.

Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI-

generated text. arXiv, abs/2306.17439, 2023a.

Xuandong Zhao, Yu-Xiang Wang, and Lei Li. Protecting language generation models via invisible water-

marking. In International Conference on Machine Learning, ICML, pp. 42187–42199, 2023b.

Zachary M. Ziegler, Yuntian Deng, and Alexander M. Rush. Neural linguistic steganography. In Proceedings

of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International

Joint Conference on Natural Language Processing, EMNLP-IJCNLP, pp. 1210–1215, 2019.

Proof of Theorem 3.4

Proof. Let

def

S = {x ∈ X | Pr[X = x] > 0} ⊂ X

(32)

be the support of X, i.e., the set of possible texts generated by f . Let Erase : X → X be

def

Erase(x) = arg min d X (x, x ′ ).

(33)

x ′ ∈S

Ties can be broken arbitrarily. We show that Erase satisfies the conditions. Take any watermark g, and

key k. From Eq. 11, for any x ∈ S, Erase(x) = False holds. As Erase(x) ∈ S, Detect(Erase(x)) = False

holds surely. Therefore, Eq. 13 holds. Besides,

E[L(Erase(X k ), C) − L(X, C)] (34)

= E[L(Erase(X k ), C) − L(X k , C)] + E[L(X k , C) − L(X, C)] (35)

(a)

≤ E[d X (Erase(X k ), X k )] + E[L(X k , C) − L(X, C)]

(36)

(b)

≤ E[d X (Erase(X k ), X k )] + ε

(37)

(c)

≤ E[d X (X, X k )] + ε

(38)

(d)

≤ ε + ε ′

(39)

hold, where (a) is due to the Lipschitzness of L, (b) is due to Eq. 9, (c) is due to the definition of Erase,

and (d) is due to Eq. 12. Therefore, Eq. 14 holds, and Erase satisfies the conditions.

Extention of Theorem 3.4

We assume that the text without a watermark can be detected almost surely, i.e.,

Pr[Detect(X, k) = False] = 1

(40)

in Theorem 3.4. We relax this assumption in the following. Let q(x) be the probability mass function of X k .

We show that the assumption can be loosened if we allow Erase to depend on q. The same Erase can be

used for different watermarks g and k as long as q is the same. q can be observed and estimated by a user

by drawing samples from watermarked texts. Therefore, dependence on q is a reasonable assumption.

16Theorem B.1. Let (X , d X ) be a metric space of texts. Let C be the random variable that indicates a condition

(i.e., prompt). Let X = f (C) be the text generated by an LLM given the condition C. Let g : C × K → X be

any watermarking function and X k = g(C, k) be the text with a watermark with key k, where C is the space

of conditions, and K is the space of keys. Let q(x) be the probability mass function of X k . Suppose

E[L(X k , C) − L(X, C)] ≤ ε

(41)

holds, where L : X ×C → R is a loss function that is 1-Lipschitz continuous with respect to the first argument,

and ε is a positive number, i.e., the quality of the text is not degraded much with the watermark. Let

Detect : X × K → {True, False} be any function such that

Pr[Detect(X k , k) = True] ≥ 1 − δ, (42)

Pr[Detect(X, k) = False] ≥ 1 − δ, (43)

i.e., the watermark can be detected reliably. Suppose

E[d X (X, X k )] ≤ ε ′ ,

(44)

hold, i.e., the watermark does not change the meaning of the text much. Then, there exists a randomized

function Erase : X → X such that

Pr[Detect(Erase(X k ), k) = False] ≥ 1 − δ, (45)

′ (46)

E[L(Erase(X k ), C) − L(X, C)] ≤ ε + ε ,

i.e., the watermark can be erased without harming the quality of the text and without knowing the key k, and

Erase is universal in the sense that it does not depend on g, k, Detect, or propmts but only on X and q.

Proof. Let q be any probability mass function on X such that there exist g̃ and k̃ such that X̃ k = g̃(C, k̃)

follows q and

E[d X (X, X̃ k )] ≤ ε ′ .

(47)

Take any such g̃ and k̃. Let

def

Pr[Erase q (x) = x ′ ] = Pr[X = x ′ | X̃ k = x].

(48)

We show that Erase satisfies the conditions. Take any watermark g and key k such that X k = g(C, k) follows

q and satisfy the assumptions of Theorem B.1. From Eq. 48, Erase q (X k ) follows the same distribution as

X, and therefore,

Pr[Detect(Erase q (X k ), k) = False] ≥ 1 − δ

(49)

(50)

holds due to Eq. 43. Besides,

E[L(Erase q (X k ), C) − L(X, C)] (51)

= E[L(Erase q (X k ), C) − L(X k , C)] + E[L(X k , C) − L(X, C)] (52)

(a)

≤ E[d X (Erase q (X k ), X k )] + E[L(X k , C) − L(X, C)]

(53)

(b)

≤ E[d X (Erase q (X k ), X k )] + ε

(54)

X X

x ′ ∈X

(c)

d X (x ′ , x k )Pr[Erase(x k ) = x ′ ]q(x k )

+ ε

(55)

x k ∈X

X X

x ′ ∈X

d X (x ′ , x k )Pr[X = x ′ | X̃ k = x k ]q(x k )

+ ε

(56)

x k ∈X

(d)

= E[d X (X, X̃ k )] + ε

(57)

(e)

≤ ε + ε ′ ,

(58)

17hold, where (a) is due to the Lipschitzness of L, (b) is due to Eq. 41, (c) is due to the definition of Erase q ,

i.e., Eq. 48, (d) follows the fact that X k also follows q, and (e) is due to Eq. 47. Therefore, Eq. 46 holds,

and Erase satisfies the conditions.

The dependence on q is necessary. We show that there are no universal erasing functions that satisfy the

conditions without depending on q by a counterexample.

Counterexample. Let X = {0, 1, 2, . . . , n}, C = {1, 2, . . . , n}, and C follow the uniform distribution on C.

Let the language model be f (c) = c ∈ {1, 2, . . . , n}. Let the loss function be zero everywhere. Take any

erasing function Erase : X → X . We show that there exists an adversarial watermark g, k such that Erase

cannot erase the watermark.

Case 1 (Erase(0) = 0): Let g(c, k) = 0 and

(

True

Detect(x, k) =

False

def

(x = 0)

(otherwise)

(59)

then

Pr[Detect(X, k) = True] = Pr[X = 0] = 0, (60)

Pr[Detect(X k , k) = True] = Pr[X k = 0] = 1, (61)

Pr[Detect(Erase(X k ), k) = True] = Pr[Detect(0) = True] = 1, (62)

i.e., Erase fails to erase the watermark.

Case 2 (Erase(0) = i ∗ ̸ = 0): Let g(c, k) = 0 and

(

True

(x ∈ {0, i ∗ })

Detect(x) =

False (otherwise)

def

(63)

then

Pr[Detect(X k , k) = True] = Pr[X k ∈ {0, i ∗ }] = 1,

Pr[Detect(X, k) = True] = Pr[X ∈ {0, i ∗ }] =

∗

Pr[Detect(Erase(X k ), k) = True] = Pr[Detect(i ) = True] = 1,

i.e., Erase fails to erase the watermark.

(64)

(65)

(66)