Abstract
As artificial intelligence systems become increasingly integrated into our daily lives, the question of alignment—ensuring AI behaves in accordance with human values and norms—has never been more critical. But what exactly should we align these systems with? This talk offers a historical overview of how public preferences, norms, and values have been measured over the past decades, revealing common errors and pitfalls in capturing true human intentions.
We will delve into the mistakes humans often make when asked about their preferences, drawing on 70 years of public opinion research that highlights how measurement can go wrong due to biases, misinterpretations, and flawed instrument design. These insights are highly relevant for developing effective human feedback loops in AI, as they underscore the importance of reducing systematic variance when soliciting input from users.
The presentation will also introduce databases that could serve as valuable benchmarks for aligning AI systems with human values. Attendees will be challenged to think about fruitful benchmark challenges, encouraging a collaborative effort between CS and social science researchers to improve how we measure and integrate human norms into AI models.
Finally, we will explore the dynamic nature of societal norms and values, discussing strategies for allowing AI systems to adapt to changes over time and across different cultural contexts. By examining these issues, the talk aims to provide a deeper understanding of what it truly means to align AI with human values and how we can achieve this alignment responsibly.