Abstract
We review foundational aspects of planning in Markov decision processes (MDPs). The goal is to build up the necessary knowledge to get to the point that we can discuss planning with generative models in large-scale MDPs. For this, we will focus on finite (but perhaps large-scale) MDPs, and leave extensions that go beyond finite MDPs to discussions. We start with basic definitions (what is an MDP, various criteria of optimality) and review fundamentals of dynamic programming (value functions, optimal value functions, Bellman’s equations). After reviewing basic properties of the three fundamental approaches of finding an optimal policy (value-, policy-iteration, linear programming, policy gradient), we wrap up exact planning with discussing known computational complexity and query complexity results. We next consider approximate planning with generative models and how one can save computation and improve generalization by moving away from exact global planning. We discuss differences between local and global planning in MDPs, and finish with a discussion of what is known about using (mainly linear) function approximation in approximate planning.