On the Global Convergence and Approximation Benefits of Policy Gradient Methods

Abstract

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, due to the multi-period nature of the objective, policy gradient algorithms face non-convex optimization problems and can get stuck in suboptimal local minima even for extremely simple problems. This talk with discus structural properties ‚Äì shared by several canonical control problems ‚Äì that guarantee the policy gradient objective function has no suboptimal stationary points despite being non-convex. Time permitting, I‚Äôll then zoom in on the special case of state aggregated policies and a proof showing that policy gradient converges to better policies than its relative, approximate policy iteration.

On the Global Convergence and Approximation Benefits of Policy Gradient Methods

Abstract

Video Recording