Generalization in the representations and computations of frontier language models.

Workshop

Unknown Futures of Generalization

Speaker(s)

Joshua Batson (Anthropic)

Location

Calvin Lab Auditoruim

Date

Tuesday, Dec. 3, 2024

Time

11:45 a.m. – 12:30 p.m. PT

Abstract

Frontier language models exhibit striking generalization across languages (human languages, programming languages, and encoding schemes). They also excel at analogies ("give an X in the style of Y"). At the same time, as other speakers will discuss, they can be quite brittle on 'reasoning'-like tasks, highly sensitive to changes in question formulation or variable name. In this talk, I will discuss the geometry of LLM representations from the perspective of 'superposition', a strategy for finding (sparse autoencoders), and surprising qualitative examples of features from a frontier model. I will also share open mathematical and empirical questions on LLM representations, with a focus on implications for computation, training, failure, and reliability.

Generalization in the representations and computations of frontier language models.

Abstract

Video Recording