Abstract
Are there any conditions under which a generative AI model’s outputs are guaranteed not to infringe on copyrighted work in its training data? If so, what are the conditions and corresponding guarantees? We argue that differential privacy (DP)—a mathematical formalism of non-disclosure for data analysis—can be the basis for such a guarantee. Roughly, if the generative model is DP and elements training dataset don’t share copyrighted expression with one another, then any infringement on the part of a user of the model is the user’s fault.
We build on the recent work of Vyas, Kakade, and Barak [VKB23] who first pose the question. They propose a property called near-access freeness (NAF), related to DP, and argue that NAF models provide “provable copyright protection.” We show that NAF does not prevent copyright infringement. NAF allows tainted models, which we propose as a test for blatant failure to prevent copying. We argue that DP succeeds where NAF fails, drawing on the idea of clean room design. In a sense, DP allows one to bring copyrighted material into a clean room without tainting it.