Enforcing Almost-Sure Reachability in POMDPs

2020-06-30Code Available0· sign in to hype

Sebastian Junges, Nils Jansen, Sanjit A. Seshia

Code Available — Be the first to reproduce this paper.

Code

github.com/sjunges/shielding-POMDPs
OfficialIn papernone★ 5

Abstract

Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

Tasks

Decision Making reinforcement-learning Reinforcement Learning (RL)Safe Exploration Sequential Decision Making

Enforcing Almost-Sure Reachability in POMDPs

Code

Abstract

Tasks

Reproductions