Capabilities and limitations of Transformer in sequential reasoning

Workshop

Transformers as a Computational Model

Speaker(s)

Bingbin Liu (Carnegie Mellon University)

Location

Calvin Lab Auditorium

Date

Monday, Sept. 23, 2024

Time

10:30 – 11:15 a.m. PT

Abstract

We study Transformer's reasoning capabilities by formulating sequential reasoning tasks with finite-state automata. We show that o(T)-layer Transformers can simulate T steps of sequential reasoning, leveraging tools from Krohn-Rhodes theory and circuit complexity. However, our empirical findings reveal that models trained in practice often fail to discover such optimal constructions, due to optimization challenges and the richness of representation. Notably, Transformers struggle with out-of-distribution generalization on a simple task easily solved by RNNs. The two types of OOD failures highlight two inherent limitations of the Transformer architecture.

Attachment

LLM24-1 Slides - Bingbin Liu.pdf

Capabilities and limitations of Transformer in sequential reasoning

Abstract

Attachment

Video Recording