SOTAVerified

Learning to Control on the Fly

2021-01-01Unverified0· sign in to hype

Zhanzhan Zhao

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper proposes an algorithm which learns to control on the fly. The proposed algorithm has no access to the transition law of the environment, which is actually linear with bounded random noise, and learns to make decisions directly online without training phases or sub-optimal policies as the initial input. Neither estimating the system parameters nor the value functions online, the proposed algorithm adapts the ellipsoid method into the online decision making setting. By adding linear constraints when the feasibility of the decision variable is violated, the volume of the decision variable domain can be collapsed and we upper bound the number of online linear constraints needed for the convergence of the state to be around the desired state under the bounded random state noise. The algorithm is also proved to be of constant bounded online regret given certain range of the bound of the random noise.

Tasks

Reproductions