SOTAVerified

Bi-linear Value Networks for Multi-goal Reinforcement Learning

2021-09-29ICLR 2022Unverified0· sign in to hype

Ge Yang, Zhang-Wei Hong, Pulkit Agrawal

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Universal value functions are used to score the long-term utility of actions to achieve a goal from the current state. In contrast to prior methods that learn a monolithic function to approximate the value, we propose a bi-linear decomposition of the value function. The first component, akin to a global plan models how the state should be changed to reach the goal. The second component, akin to a local controller selects the optimal action to actualize the desired change in state. We simultaneously learn both components. Such decomposition enables both the global and local components to make efficient use of interaction data and independently generalize. The consequence is superior overall generalization and performance of our system on a wide range of challenging goal-conditioned tasks in comparison to the current state-of-the-art.

Tasks

Reproductions