RL + Transformer = A General-Purpose Problem Solver

2025-01-24Unverified0· sign in to hype

Micah Rentschler, Jesse Roberts

Unverified — Be the first to reproduce this paper.

Abstract

What if artificial intelligence could not only solve problems for which it was trained but also learn to teach itself to solve new problems (i.e., meta-learn)? In this study, we demonstrate that a pre-trained transformer fine-tuned with reinforcement learning over multiple episodes develops the ability to solve problems that it has never encountered before - an emergent ability called In-Context Reinforcement Learning (ICRL). This powerful meta-learner not only excels in solving unseen in-distribution environments with remarkable sample efficiency, but also shows strong performance in out-of-distribution environments. In addition, we show that it exhibits robustness to the quality of its training data, seamlessly stitches together behaviors from its context, and adapts to non-stationary environments. These behaviors demonstrate that an RL-trained transformer can iteratively improve upon its own solutions, making it an excellent general-purpose problem solver.

Tasks

In-Context Reinforcement Learning reinforcement-learning Reinforcement Learning

RL + Transformer = A General-Purpose Problem Solver

Abstract

Tasks

Reproductions