Defense against prompt injection attacks

Workshop

Alignment, Trust, Watermarking, and Copyright Issues in LLMs

Speaker(s)

David Wagner (UC Berkeley)

Location

Calvin Lab Auditorium

Date

Monday, Oct. 14, 2024

Time

2 – 2:45 p.m. PT

Abstract

Prompt injection attacks are a significant threat to the security of LLM-integrated applications. These attacks exploit the lack of a clear separation between instructions/prompts and user data. I will introduce the notion of structured queries, a general approach to tackle this problem by explicitly separating prompt and data and training LLMs to respect this separation. I will describe how to adjust standard instruction tuning to respect this separation, and show the resulting models provide significant improvements in robustness against prompt injection.