Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Johannes Heinrich, David Silver
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/deepmind/open_spielnone★ 5,095
- github.com/EricSteinberger/DREAMnone★ 120
- github.com/quantumiracle/marspytorch★ 49
- github.com/jsanderink/tuepytorch★ 0
- github.com/heidekrueger/bnelearnpytorch★ 0
- github.com/TinkeringCode/Neural-Fictitous-Self-Playpytorch★ 0
- github.com/IAARhub/TrucoAnalyticsnone★ 0
Abstract
Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.