Estimating Q(s,s') with Deterministic Dynamics Gradients

2020-01-01ICML 2020Unverified0· sign in to hype

Ashley Edwards, Himanshu Sahni, Rosanne Liu, Jane Hung, Ankit Jain, Rui Wang, Adrien Ecoffet, Thomas Miconi, Charles Isbell, Jason Yosinski

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this paper, we introduce a novel form of a value function, Q(s, s'), that expresses the utility of transitioning from a state s to a neighboring state s' and then acting optimally thereafter. In order to derive an optimal policy, we develop a novel forward dynamics model that learns to make next-state predictions that maximize Q(s,s'). This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies.

Tasks

Transfer Learning

Estimating Q(s,s') with Deterministic Dynamics Gradients

Abstract

Tasks

Reproductions