Abstract

“Hallucinations” are a major problem for language models. We shed light on this phenomenon by showing that calibration, which is naturally encouraged during the pre-training of language models, leads to hallucinations. Moreover, the rate of hallucinations depends on the domain via the classic Good-Turing estimator. Interestingly, this estimate is small for facts like paper titles, which have been a notorious source of hallucinations. The analysis also suggests methods for mitigating hallucinations. This is joint work with Santosh Vempala and was done while the speaker was at Microsoft Research New England.

Attachment

Video Recording