Abstract
Recent empirical advances show that training neural network models with large learning rates often improves generalization performance. Various phenomena arise when using large learning rates, which cannot be explained by classical optimization theory. In this talk, I will theoretically demonstrate different effects of large learning rates, including edge of stability, balancing, and catapult. These results are based on a new convergence analysis under large learning rates from a dynamical perspective, applied to a family of nonconvex functions of various regularities without Lipschitz gradient.