Reinforcement Learning for Predict+Optimize

2020-12-14CUHK Course IERG5350 2020Unverified0· sign in to hype

Xinyi Hu, Yuansen Cheng

Unverified — Be the first to reproduce this paper.

Abstract

Predict+Optimize (P+O) is a machine learning framework for optimization problems with unknown parameters. This paper presents a framework to tackle P+O problems using neural networks and reinforcement learning. We focus on the traveling salesman problem and train a recurrent neural network that, given a directed graph, predicts a distribution over different edges permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent neural network using a policy gradient method.

Tasks

reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)Traveling Salesman Problem

Reinforcement Learning for Predict+Optimize

Abstract

Tasks

Reproductions