Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information

2020-05-01ICLR 2020Unverified0· sign in to hype

Yichi Zhou, Jialian Li, Jun Zhu

Unverified — Be the first to reproduce this paper.

Abstract

Posterior sampling for reinforcement learning (PSRL) is a useful framework for making decisions in an unknown environment. PSRL maintains a posterior distribution of the environment and then makes planning on the environment sampled from the posterior distribution. Though PSRL works well on single-agent reinforcement learning problems, how to apply PSRL to multi-agent reinforcement learning problems is relatively unexplored. In this work, we extend PSRL to two-player zero-sum extensive-games with imperfect information (TEGI), which is a class of multi-agent systems. More specifically, we combine PSRL with counterfactual regret minimization (CFR), which is the leading algorithm for TEGI with a known environment. Our main contribution is a novel design of interaction strategies. With our interaction strategies, our algorithm provably converges to the Nash Equilibrium at a rate of O( T/T). Empirical results show that our algorithm works well.

Tasks

counterfactual Multi-agent Reinforcement Learning reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information

Abstract

Tasks

Reproductions