SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 38263850 of 15113 papers

TitleStatusHype
B2RL: An open-source Dataset for Building Batch Reinforcement LearningCode0
Adjust Planning Strategies to Accommodate Reinforcement Learning AgentsCode0
From Gameplay to Symbolic Reasoning: Learning SAT Solver Heuristics in the Style of Alpha(Go) ZeroCode0
Action Priors for Large Action Spaces in RoboticsCode0
Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement LearningCode0
From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence PredictionCode0
From Images to Connections: Can DQN with GNNs learn the Strategic Game of Hex?Code0
Hierarchical Potential-based Reward Shaping from Task SpecificationsCode0
Characterizing Attacks on Deep Reinforcement LearningCode0
FREED++: Improving RL Agents for Fragment-Based Molecule Generation by Thorough ReproductionCode0
Fourier Features in Reinforcement Learning with Neural NetworksCode0
Risk-sensitive control as inference with Rényi divergenceCode0
Free energy-based reinforcement learning using a quantum processorCode0
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN ParametersCode0
Foresee then Evaluate: Decomposing Value Estimation with Latent Future PredictionCode0
Free-Lunch Saliency via Attention in Atari AgentsCode0
Challenging common bolus advisor for self-monitoring type-I diabetes patients using Reinforcement LearningCode0
Challenges of Context and Time in Reinforcement Learning: Introducing Space Fortress as a BenchmarkCode0
Flight Controller Synthesis Via Deep Reinforcement LearningCode0
Challenges in High-dimensional Reinforcement Learning with Evolution StrategiesCode0
Backpropagation through the Void: Optimizing control variates for black-box gradient estimationCode0
Flexible Option LearningCode0
Flappy Hummingbird: An Open Source Dynamic Simulation of Flapping Wing Robots and AnimalsCode0
Fleet Control using Coregionalized Gaussian Process Policy IterationCode0
Frequentist Regret Bounds for Randomized Least-Squares Value IterationCode0
Show:102550
← PrevPage 154 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified