Abstract

Mode connectivity (Garipov et al., 2018; Draxler et al., 2018) is a surprising phenomenon in the loss landscape of deep nets. Optima—at least those discovered by gradient-based optimization—turn out to be connected by simple paths on which the loss function is almost constant. Often, these paths can be chosen to be piece-wise linear, with as few as two segments. In this talk we will give mathematical explanations for this phenomenon. In particular, we show that although in many settings not all optima are connected, typical optima that satisfy nice properties (dropout stability and noise stability) are connected. These properties ave previously been identified as part of understanding the generalization properties of deep net.

Based on joint work with Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu and Sanjeev Arora.

Video Recording