Reinforcement with Fading Memories
Kuang Xu, Se-Young Yun
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We study the effect of imperfect memory on decision making in the context of a stochastic sequential action-reward problem. An agent chooses a sequence of actions which generate discrete rewards at different rates. She is allowed to make new choices at rate , while past rewards disappear from her memory at rate . We focus on a family of decision rules where the agent makes a new choice by randomly selecting an action with a probability approximately proportional to the amount of past rewards associated with each action in her memory. We provide closed-form formulae for the agent's steady-state choice distribution in the regime where the memory span is large ( 0), and show that the agent's success critically depends on how quickly she updates her choices relative to the speed of memory decay. If , the agent almost always chooses the best action, i.e., the one with the highest reward rate. Conversely, if , the agent chooses an action with a probability roughly proportional to its reward rate.