SOTAVerified

Planning with General Objective Functions: Going Beyond Total Rewards

2020-12-01NeurIPS 2020Unverified0· sign in to hype

Ruosong Wang, Peilin Zhong, Simon S. Du, Russ R. Salakhutdinov, Lin Yang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i.e., maximize _h = 1^H r_h where H is the planning horizon. However, this paradigm fails to model important practical applications, e.g., safe control that aims to maximize the lowest reward, i.e., maximize _h= 1^H r_h. In this paper, based on techniques in sketching algorithms, we propose a novel planning algorithm in deterministic systems which deals with a large class of objective functions of the form f(r_1, r_2, ... r_H) that are of interest to practical applications. We show that efficient planning is possible if f is symmetric under permutation of coordinates and satisfies certain technical conditions. Complementing our algorithm, we further prove that removing any of the conditions will make the problem intractable in the worst case and thus demonstrate the necessity of our conditions.

Tasks

Reproductions