Recent Efficiency Improvements to Transformers

Workshop

Transformers as a Computational Model

Speaker(s)

David Woodruff (Carnegie Mellon University)

Location

Calvin Lab Auditorium

Date

Tuesday, Sept. 24, 2024

Time

3 – 3:45 p.m. PT

Abstract

The quadratic time required to compute attention layers in transformers is a major bottleneck for long context lengths. I will survey recent approximation algorithms based on dimensionality reduction, which under certain assumptions, achieve linear time. I will focus on HyperAttention and PolySketchFormer, discussing their theory and practice, and also mention recent followup work.

Attachment

LLM24-1 Slides - David Woodruff.pdf