Abstract

Markov Decision Processes (MDPs) are a fundamental mathematical model to reason about uncertainty and play a key role in reinforcement learning theory. In this talk, I will discuss recent advances in optimizing MDPs given by a generative model. Though near-optimal sample complexities are known for approximately optimizing MDPs with discounted reward functions, obtaining a similar characterization of the average-reward functions has been elusive. In this talk, I will discuss how to improve the sample complexity for this problem using convex optimization tools and how to obtain near-optimal sample complexities in certain regimes by reduction to solving discounted MDPs. This talk will feature joint work with Yujia Jin (arxiv:2008.12776 and arXiv:2106.07046).

Attachment