Distortion-free mechanisms for language model provenance

Workshop

Alignment, Trust, Watermarking, and Copyright Issues in LLMs

Speaker(s)

Rohith Kuditipudi (Stanford University)

Location

Calvin Lab Auditorium

Date

Thursday, Oct. 17, 2024

Time

11:45 a.m. – 12:30 p.m. PT

Abstract

I will discuss mechanisms for establishing provenance of two types of language model artifacts: text and weights.

In the first part of the talk, I will cover work (joint with John Thickstun, Tatsu Hashimoto, and Percy Liang) on watermarking text generated by an autoregressive language model. We leverage the inherent randomness of token sampling to construct the first watermarks that are robust to editing a constant fraction of the text without changing the distribution over text (up to a certain generation budget).

In the second part of the talk, I will cover work (joint with Sally Zhu, Ahmed Ahmed, and Percy Liang) on testing whether two language models were independently trained based on their weights. We leverage the inherent randomness of model training to develop exact post-hoc tests of independence without intervening on the training process.

Attachment

LLM24-2 Slides - Rohith Kuditipudi.pdf

Distortion-free mechanisms for language model provenance

Abstract

Attachment

Video Recording