What is the purpose of analyzing language model watermarks?

Original title: Mark My Words: Analyzing and Evaluating Language Model Watermarks

Authors: Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, David Wagner

In an article addressing the concerns surrounding the misuse of large language models, the authors emphasize the importance of being able to distinguish between machine-generated text and human-authored content. To achieve this, they propose a comprehensive benchmark for text watermarking techniques. Unlike previous works that focused on image watermarks, this benchmark evaluates the techniques specifically for text. The authors consider three main metrics: quality, size (the number of tokens required to detect a watermark), and tamper-resistance. They demonstrate that current watermarking techniques, such as Kirchenbauer et al.’s approach, can successfully watermark machine-generated text without any noticeable loss in quality using less than 100 tokens. Furthermore, these techniques exhibit good resistance against simple attacks, regardless of temperature. The authors argue that watermark indistinguishability is too stringent and show that schemes that slightly modify logit distributions outperform indistinguishable counterparts without sacrificing generation quality. They make their benchmark publicly available to facilitate further research in this area.

Original article: https://arxiv.org/abs/2312.00273