Computational Benefits and Limitations of Transformers and State-Space Models

Workshop

Transformers as a Computational Model

Speaker(s)

Eran Malach (Kempner Institute, Harvard University)

Location

Calvin Lab Auditorium

Date

Tuesday, Sept. 24, 2024

Time

10:30 – 11 a.m. PT

Abstract

In this talk, we will discuss the mechanisms that enable retrieval, copying, and length generalization in language models, as well as how the choice of network architecture influences the model's success or failure in basic tasks. First, we will present theoretical and empirical evidence demonstrating that Transformers, the dominant architecture for sequence modeling, excel at copying and retrieval tasks, whereas LSTM and state-space models (e.g., Mamba) perform poorly on these same tasks. Next, we will show how the ability of Transformers to copy long sequences can be leveraged to achieve length generalization across various algorithmic and arithmetic tasks.

Computational Benefits and Limitations of Transformers and State-Space Models

Abstract

Video Recording