Neural Value Iteration

2026-03-16Unverified0· sign in to hype

Yang You, Ufuk Çakır, Alex Schutz, Nick Hawes

Unverified — Be the first to reproduce this paper.

Abstract

The value function of a POMDP exhibits the piecewise-linear-convex (PWLC) property and can be represented as a finite set of hyperplanes, known as α-vectors. Most state-of-the-art POMDP solvers (offline planners) follow the point-based value iteration scheme, which performs Bellman backups on α-vectors at reachable belief points until convergence. However, since each α-vector is |S|-dimensional, these methods quickly become intractable for large-scale problems due to the prohibitive computational cost of Bellman backups. In this work, we demonstrate that the PWLC property allows a POMDP's value function to be alternatively represented as a finite set of neural networks. This insight enables a novel POMDP planning algorithm called Neural Value Iteration, which combines the generalization capability of neural networks with the classical value iteration framework. Our approach achieves near-optimal solutions even in extremely large POMDPs that are intractable for existing offline solvers.

Neural Value Iteration

Abstract

Reproductions