Truthful Self-Play

2021-06-06Unverified0· sign in to hype

Shohei Ohsawa

Unverified — Be the first to reproduce this paper.

Abstract

We present a general framework for evolutionary learning to emergent unbiased state representation without any supervision. Evolutionary frameworks such as self-play converge to bad local optima in case of multi-agent reinforcement learning in non-cooperative partially observable environments with communication due to information asymmetry. Our proposed framework is a simple modification of self-play inspired by mechanism design, also known as reverse game theory, to elicit truthful signals and make the agents cooperative. The key idea is to add imaginary rewards using the peer prediction method, i.e., a mechanism for evaluating the validity of information exchanged between agents in a decentralized environment. Numerical experiments with predator prey, traffic junction and StarCraft tasks demonstrate that the state-of-the-art performance of our framework.

Tasks

Multi-agent Reinforcement Learning Starcraft

Truthful Self-Play

Abstract

Tasks

Reproductions