Transformers as a Computational Model

Program

Special Year on Large Language Models and Transformers, Part 1

Location

Calvin Lab auditorium

Date

Monday, Sept. 23 – Friday, Sept. 27, 2024

About

This workshop will consider the role of transformers as a central building block in the development of large language models, as well as their inherent limitations. The main questions posed will be around their effectiveness (What core properties makes transformers work so well?), necessity (Are there models that will do even better?), future ability (Can we extrapolate the future capabilities of LLMs scaling with data and compute?), and computational properties (What TCS models can be used to understand the emergence of complex skills in LLM models?). We will also consider other tools that may illuminate the properties of transformers such as tools from computational physics. These questions will explore both the transformer as a model class as well as the learned abilities from trained models. While the workshop focuses on Transformers, we will also explore other alternatives gaining popularity such as state-space models (SSMs).

Chairs/Organizers

Jacob Andreas (Massachusetts Institute of Technology)

Surbhi Goel (University of Pennsylvania)

Ravi Kannan (Simons Institute, UC Berkeley)

Jacob Steinhardt (UC Berkeley)

Invited Participants

Yeganeh Ali Mohammadi (Berkeley), Jacob Andreas (Massachusetts Institute of Technology), Simran Arora (Stanford University), Stella Biderman (ElutherAI), David Chiang (University of Notre Dame), Evelina Fedorenko (Massachusetts Institute of Technology), Mor Geva (Tel Aviv University), Tom Goldstein (University of Maryland), Albert Gu (Carnegie Mellon University), Daniel Hsu (Columbia University), Jon Kleinberg (Cornell University), Bingbin Liu (Carnegie Mellon University), Eran Malach (The Hebrew University of Jerusalem), Will Merrill (New York University), Sewon Min (UC Berkeley & AI2), Christos Papadimitriou (Columbia University), Philippe Rigollet (MIT), Naomi Saphra (Kempner Institute at Harvard University), Nati Srebro (Toyota Technological Institute at Chicago), Andrew Wilson (New York University)