Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

2023-05-30Code Available0· sign in to hype

Devdhar Patel, Terrence Sejnowski, Hava Siegelmann

Code Available — Be the first to reproduce this paper.

Code

github.com/dee0512/Temporally-Layered-Architecture
OfficialIn paperpytorch★ 2

Abstract

The current reinforcement learning framework focuses exclusively on performance, often at the expense of efficiency. In contrast, biological control achieves remarkable performance while also optimizing computational energy expenditure and decision frequency. We propose a Decision Bounded Markov Decision Process (DB-MDP), that constrains the number of decisions and computational energy available to agents in reinforcement learning environments. Our experiments demonstrate that existing reinforcement learning algorithms struggle within this framework, leading to either failure or suboptimal performance. To address this, we introduce a biologically-inspired, Temporally Layered Architecture (TLA), enabling agents to manage computational costs through two layers with distinct time scales and energy requirements. TLA achieves optimal performance in decision-bounded environments and in continuous control environments, it matches state-of-the-art performance while utilizing a fraction of the compute cost. Compared to current reinforcement learning algorithms that solely prioritize performance, our approach significantly lowers computational energy expenditure while maintaining performance. These findings establish a benchmark and pave the way for future research on energy and time-aware control.

Tasks

continuous-control Continuous Control OpenAI Gym Pendulum-v1 reinforcement-learning Reinforcement Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Ant-v2	TLA	Mean Reward	5,163.54	—	Unverified
HalfCheetah-v2	TLA	Mean Reward	9,571.99	—	Unverified
Hopper-v2	TLA	Mean Reward	3,458.22	—	Unverified
InvertedDoublePendulum-v2	TLA	Mean Reward	9,356.67	—	Unverified
InvertedPendulum-v2	TLA	Mean Reward	1,000	—	Unverified
MountainCarContinuous-v0	TLA	Mean Reward	93.88	—	Unverified
Pendulum-v1	TLA	Mean Reward	-154.92	—	Unverified
Walker2d-v2	TLA	Mean Reward	3,878.41	—	Unverified

Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

Code

Abstract

Tasks

Benchmark Results

Reproductions