Long context modeling and generalization: two perspectives

Abstract

Many complex or time-consuming tasks for humans involve processing huge quantities of data; however, pretrained models cannot perform many of these tasks because of their failure to effectively use inputs exceeding their pretrained context length. This talk discusses two perspectives on long context and generalization. First, we consider long context itself as a form of generalization from short-input to long-input data, and review strategies from the community for this problem. Second, we discuss long context in-context learning (ICL). We show that, for many datasets with large label spaces, performance continues to increase with hundreds or thousands of demonstrations. We contrast this with example retrieval and finetuning: example retrieval shows excellent performance at low context lengths but has diminished gains with more demonstrations; finetuning is more data hungry than ICL but can sometimes exceed long-context ICL performance with additional data. We use this ICL setting as a testbed to study several properties of both in-context learning and long-context models.

Attachment

Slides

Long context modeling and generalization: two perspectives

Abstract

Attachment

Video Recording