SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1195112000 of 15113 papers

TitleStatusHype
Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning in Asymmetric Imperfect-Information Games0
ModelicaGym: Applying Reinforcement Learning to Modelica ModelsCode1
Segregation Dynamics with Reinforcement Learning and Agent Based Modeling0
Sample Efficient Policy Gradient Methods with Recursive Variance ReductionCode0
Fine-Tuning Language Models from Human PreferencesCode3
A Hierarchical Two-tier Approach to Hyper-parameter Optimization in Reinforcement Learning0
Stock market microstructure inference via multi-agent reinforcement learning0
Controllable Length Control Neural Encoder-Decoder via Reinforcement Learning0
Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning0
Emergent Tool Use From Multi-Agent AutocurriculaCode2
Hierarchical Reinforcement Learning for Open-Domain Dialog0
Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model0
A Review of Tracking, Prediction and Decision Making Methods for Autonomous Driving0
Adversarial Feature Training for Generalizable Robotic Visuomotor Control0
MDP Playground: An Analysis and Debug Testbed for Reinforcement LearningCode0
Meta Reinforcement Learning for Sim-to-real Domain Adaptation0
Off-road Autonomous Vehicles Traversability Analysis and Trajectory Planning Based on Deep Inverse Reinforcement Learning0
Selective Network Discovery via Deep Reinforcement Learning on Embedded Spaces0
Leveraging human Domain Knowledge to model an empirical Reward function for a Reinforcement Learning problem0
Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement LearningCode0
Data Centers Job Scheduling with Deep Reinforcement Learning0
State Representation Learning from Demonstration0
Wield: Systematic Reinforcement Learning With Progressive Randomization0
Policy Prediction Network: Model-Free Behavior Policy with Model-Based Learning in Continuous Action Space0
Model Based Planning with Energy Based Models0
Driving in Dense Traffic with Model-Free Reinforcement LearningCode0
Learning to Recover Sparse Signals0
Active Learning for Risk-Sensitive Inverse Reinforcement Learning0
Flight Controller Synthesis Via Deep Reinforcement LearningCode0
Node Injection Attacks on Graphs via Reinforcement Learning0
Towards an Adaptive Robot for Sports and Rehabilitation Coaching0
Petri Net Machines for Human-Agent Interaction0
Say What I Want: Towards the Dark Side of Neural Dialogue Models0
Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies0
DL2: A Deep Learning-driven Scheduler for Deep Learning ClustersCode0
AITuning: Machine Learning-based Tuning Tool for Run-Time Communication Libraries0
HJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints0
Reinforcement Learning for Portfolio ManagementCode0
Joint Inference of Reward Machines and Policies for Reinforcement Learning0
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning0
Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning0
Modeling Sensorimotor Coordination as Multi-Agent Reinforcement Learning with Differentiable Communication0
Modelling Working Memory using Deep Recurrent Reinforcement Learning0
Reinforcement Learning Models of Human Behavior: Reward Processing in Mental Disorders0
Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning0
Predicting optimal value functions by interpolating reward functions in scalarized multi-objective reinforcement learningCode0
RecSim: A Configurable Simulation Platform for Recommender SystemsCode0
Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction GuaranteesCode1
On Memory Mechanism in Multi-Agent Reinforcement Learning0
Correlation Priors for Reinforcement Learning0
Show:102550
← PrevPage 240 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified