Breaking the Curse of Multiagency in Multi-Agent RL: Decentralized Function Approximation and Faster Convergence

Abstract

(title and abstract are tentative and subject to minor changes)

A unique challenge in Multi-Agent Reinforcement Learning (MARL) is the curse of multiagency, where the description length of the game as well as the complexity of many existing learning algorithms scale exponentially with the number of agents. While recent works successfully address this challenge under the model of tabular Markov Games, their mechanisms critically rely on the number of states being finite and small, and do not extend to practical scenarios with enormous state spaces where function approximation must be used to approximate value functions or policies.

In this talk, we present the first line of MARL algorithms that provably resolve the curse of multiagency under function approximation. Our first algorithm, V-Learning with Policy Replay, obtains the first result for learning approximate Coarse Correlated Equilibria (CCEs) of Markov Games under decentralized linear function approximation, and improved decentralized results for finding Markov CCEs in tabular Markov Games. Our second algorithm, Decentralized Optimistic Policy Mirror Descent, works for a wider range of problems under generic function approximation, in exchange for learning a weaker notion of equilibria called “policy-class-restricted CCEs”. Finally, besides function approximation, we will also present results on faster policy optimization algorithms for tabular Markov Games, by new optimistic variants of V-Learning.