Hypothesis Driven Coordinate Ascent for Reinforcement Learning

2021-09-29Unverified0· sign in to hype

John Kenton Moore, Junier Oliva

Unverified — Be the first to reproduce this paper.

Abstract

This work develops a novel black box optimization technique for learning robust policies for stochastic environments. Through combining coordinate ascent with hypothesis testing, Hypothesis Driven Coordinate Ascent (HDCA) optimizes without computing or estimating gradients. The simplicity of this approach allows it to excel in a distributed setting; its implementation provides an interesting alternative to many state-of-the-art methods for common reinforcement learning environments. HDCA was evaluated on various problems from the MuJoCo physics simulator and OpenAI Gym framework, achieving equivalent or superior results to standard RL benchmarks.

Tasks

MuJoCo OpenAI Gym reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Hypothesis Driven Coordinate Ascent for Reinforcement Learning

Abstract

Tasks

Reproductions