Abstract

Retrieval-based LMs, which combine learned parameters with a datastore—a large collection of text documents—offer a compelling alternative to dense models, removing the need for remembering every detail from data and allowing for seamless updating. In this talk, I present two recent works in improving retrieval-based models in the context of LLM. In the first work, we pre-train an LM to condition on retrieved documents, unlike previous approaches that use an LM trained with a standard objective as it is. Our model outperforms previous approaches, especially when retrieval context is irrelevant and distracting. In the second work, we study the scaling properties of retrieval-based LMs. With a new datastore consisting of 1.4 trillion tokens, we show that compute-optional setup is almost always with retrieval over a range of downstream tasks. I will conclude by discussing open-ended questions—whether retrieval can bring the effect of training on data, how retrieval can handle data restrictions, and the potential for modular LMs to generalize this approach.

Video Recording