SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 60766100 of 15113 papers

TitleStatusHype
SoloParkour: Constrained Reinforcement Learning for Visual Locomotion from Privileged Experience0
SOLO: Search Online, Learn Offline for Combinatorial Optimization Problems0
Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling0
Solve Traveling Salesman Problem by Monte Carlo Tree Search and Deep Neural Network0
Solving a New 3D Bin Packing Problem with Deep Reinforcement Learning Method0
Solving Bayesian inverse problems with diffusion priors and off-policy RL0
Solving Richly Constrained Reinforcement Learning through State Augmentation and Reward Penalties0
Solving Continual Combinatorial Selection via Deep Reinforcement Learning0
Solving Finite-Horizon MDPs via Low-Rank Tensors0
Solving Heterogeneous General Equilibrium Economic Models with Deep Reinforcement Learning0
Solving Math Word Problems with Double-Decoder Transformer0
Solving Multi-Goal Robotic Tasks with Decision Transformer0
Normalized Cut with Reinforcement Learning in Constrained Action Space0
Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning0
Solving optimal stopping problems with Deep Q-Learning0
Solving Reach-Avoid-Stay Problems Using Deep Deterministic Policy Gradients0
Solving robust MDPs as a sequence of static RL problems0
Solving Rubik's Cube Without Tricky Sampling0
Solving single-objective tasks by preference multi-objective reinforcement learning0
Solving Sokoban with forward-backward reinforcement learning0
Solving Stochastic Games0
Solving the capacitated vehicle routing problem with timing windows using rollouts and MAX-SAT0
Solving the Order Batching and Sequencing Problem using Deep Reinforcement Learning0
Solving the single-track train scheduling problem via Deep Reinforcement Learning0
Solving the Spike Feature Information Vanishing Problem in Spiking Deep Q Network with Potential Based Normalization0
Show:102550
← PrevPage 244 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified