Abstract

I will describe our recent work on a deep energy model which brings together kernel density estimators and empirical Bayes least squares estimators. The energy model is the first of its kind in that the learning does not involve inference (negative samples), and the density estimation is formulated purely as an optimization problem, scalable to high dimensions and efficient with double backpropagation and SGD. An elegant physical picture emerges of an interacting system of high-dimensional spheres around each data point together with a globally-defined probability flow field. The picture is powerful and it leads to a novel sampling algorithm (walk-jump sampling), a new notion of associative memory (Robbins associative memory), and it is instrumental in designing experiments. I will finish the talk by showcasing the emergence of remarkably rich creative modes when the model is trained on MNIST. Reference: https://arxiv.org/abs/1903.02334