SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models
Amirhossein Dabiriaghdam, Lele Wang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/DabiriAghdam/SimMarkOfficialpytorch★ 13
Abstract
The rapid proliferation of large language models (LLMs) has created an urgent need for reliable methods to detect whether a text is generated by such models. In this paper, we propose SimMark, a posthoc watermarking algorithm that makes LLMs' outputs traceable without requiring access to the model's internal logits, enabling compatibility with a wide range of LLMs, including API-only models. By leveraging the similarity of semantic sentence embeddings and rejection sampling to impose detectable statistical patterns imperceptible to humans, and employing a soft counting mechanism, SimMark achieves robustness against paraphrasing attacks. Experimental results demonstrate that SimMark sets a new benchmark for robust watermarking of LLM-generated content, surpassing prior sentence-level watermarking techniques in robustness, sampling efficiency, and applicability across diverse domains, all while preserving the text quality.