SOTAVerified

Demystifying Linear MDPs and Novel Dynamics Aggregation Framework

2024-10-31Unverified0· sign in to hype

Joongkyu Lee, Min-hwan Oh

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

In this work, we prove that, in linear MDPs, the feature dimension d is lower bounded by S/U in order to aptly represent transition probabilities, where S is the size of the state space and U is the maximum size of directly reachable states. Hence, d can still scale with S depending on the direct reachability of the environment. To address this limitation of linear MDPs, we propose a novel structural aggregation framework based on dynamics, named as the "dynamics aggregation". For this newly proposed framework, we design a provably efficient hierarchical reinforcement learning algorithm in linear function approximation that leverages aggregated sub-structures. Our proposed algorithm exhibits statistical efficiency, achieving a regret of O ( d_^3/2 H^3/2 N T ), where d_ represents the feature dimension of aggregated subMDPs and N signifies the number of aggregated subMDPs. We establish that the condition d_^3 N d^3 is readily met in most real-world environments with hierarchical structures, enabling a substantial improvement in the regret bound compared to LSVI-UCB, which enjoys a regret of O (d^3/2 H^3/2 T). To the best of our knowledge, this work presents the first HRL algorithm with linear function approximation that offers provable guarantees.

Tasks

Reproductions