
Abstract
In this talk I will present work showing that agents using simple policy gradient algorithms in arguably the simplest class of continuous action- and state-space multi-agent control problem: general-sum linear quadratic games, have no guarantees of asymptotic convergence, and that proximal point and extra-gradients will not solve these issues. I will then focus in on zero-sum LQ games in which stronger convergence guarantees are possible when agents use independent policy gradients with a finite timescale separation.