Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1711–1720 of 15113 papers

Title	Date	Tasks	Status	Hype
Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning	Nov 12, 2019	Combinatorial OptimizationGraph Embedding	CodeCode Available	1
On Effective Scheduling of Model-based Reinforcement Learning	Nov 16, 2021	continuous-controlContinuous Control	CodeCode Available	1
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks	Oct 29, 2024	MambaReinforcement Learning (RL)	CodeCode Available	1
One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control	Jul 9, 2020	Allreinforcement-learning	CodeCode Available	1
Combining Deep Reinforcement Learning and Search for Imperfect-Information Games	Jul 27, 2020	Deep Reinforcement Learningreinforcement-learning	CodeCode Available	1
On Joint Learning for Solving Placement and Routing in Chip Design	Oct 30, 2021	GPUreinforcement-learning	CodeCode Available	1
Online 3D Bin Packing with Constrained Deep Reinforcement Learning	Jun 26, 2020	3D Bin PackingCollision Avoidance	CodeCode Available	1
Online and Offline Reinforcement Learning by Planning with a Learned Model	Apr 13, 2021	Atari GamesContinuous Control	CodeCode Available	1
AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation	Nov 3, 2023	Reinforcement Learning (RL)	CodeCode Available	1
Collective eXplainable AI: Explaining Cooperative Strategies and Agent Contribution in Multiagent Reinforcement Learning with Shapley Values	Oct 4, 2021	Decision MakingDeep Reinforcement Learning	CodeCode Available	1

Show:10 25 50

← PrevPage 172 of 1512Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PPG	Mean Normalized Performance	0.76	—	Unverified
2	PPO	Mean Normalized Performance	0.58	—	Unverified