Associative memories as a building block in Transformers

Workshop

Transformers as a Computational Model

Speaker(s)

Alberto Bietti (Flatiron Institute)

Location

Calvin Lab Auditorium

Date

Friday, Sept. 27, 2024

Time

3 – 3:45 p.m. PT

Abstract

Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt.
Through toy tasks for reasoning and factual recall, we highlight the role of weight matrices as associative memories, and provide theoretical results on how gradients enable their learning during training, and how over-parameterization affects their storage capacity.

Associative memories as a building block in Transformers

Abstract

Video Recording