Abstract

Is ingesting in-copyright works posted on the open internet as training data for building large language models (LLMs) copyright infringement or not? Several lawsuits and law review articles address this question. Those lawsuits are still in early stages and resolution of this fundamental question is likely years off. Beyond litigation, the Copyright Office is expected to ask for comments from interested stakeholders and the public about their views on this question. The Office will issue a report and possibly recommend legislation. If courts decide that ingestion is infringement, they have the power to order the destruction of models trained on this data, although this is discretionary, not mandatory. The stakes for this nascent industry and for researchers in the resolution of this issue could not be greater.

Video Recording