SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1100111050 of 15113 papers

TitleStatusHype
Adaptive Dialog Policy Learning with Hindsight and User Modeling0
Reinforcement Learning with Feedback Graphs0
Safe Reinforcement Learning through Meta-learned Instincts0
Robotic Arm Control and Task Training through Deep Reinforcement Learning0
Reinforcement Learning for UAV Autonomous Navigation, Mapping and Target Detection0
Gifting in multi-agent reinforcement learningCode0
A Survey on Dialog Management: Recent Advances and Challenges0
Generalized Planning With Deep Reinforcement Learning0
Discrete-to-Deep Supervised Policy LearningCode0
Formal Policy Synthesis for Continuous-Space Systems via Reinforcement Learning0
Generalized Reinforcement Meta Learning for Few-Shot Optimization0
Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation0
Reward Constrained Interactive Recommendation with Natural Language Feedback0
Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning0
Setting up experimental Bell test with reinforcement learning0
Noise Pollution in Hospital Readmission Prediction: Long Document Classification with Reinforcement Learning0
Multi-agent Reinforcement Learning for Decentralized Stable Matching0
Optimal Beam Association for High Mobility mmWave Vehicular Networks: Lightweight Parallel Reinforcement Learning Approach0
Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey0
Enhancing Text-based Reinforcement Learning Agents with Commonsense Knowledge0
Learning the Arrow of Time for Problems in Reinforcement Learning0
AMRL: Aggregated Memory For Reinforcement Learning0
Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning0
Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning0
Explain Your Move: Understanding Agent Actions Using Focused Feature SaliencyCode0
Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning?0
Improving Robustness via Risk Averse Distributional Reinforcement Learning0
Exploration in Reinforcement Learning with Deep Covering Options0
Episodic Reinforcement Learning with Associative Memory0
Learning Efficient Parameter Server Synchronization Policies for Distributed SGD0
Synthesizing Programmatic Policies that Inductively Generalize0
Model Based Reinforcement Learning for Atari0
Model-based reinforcement learning for biological sequence design0
Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control0
Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information0
The Ingredients of Real World Robotic Reinforcement Learning0
Reinforcement learning of minimalist grammars0
Unsupervised Learning of KB Queries in Task-Oriented Dialogs0
Towards Embodied Scene Description0
Out-of-the-box channel pruned networks0
Plan-Space State Embeddings for Improved Reinforcement Learning0
DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning0
GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning0
Delay-aware Resource Allocation in Fog-assisted IoT Networks Through Reinforcement Learning0
Improving Factual Consistency Between a Response and Persona Facts0
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging0
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning0
Graph-based State Representation for Deep Reinforcement LearningCode0
Reduced-Dimensional Reinforcement Learning Control using Singular Perturbation Approximations0
Whittle index based Q-learning for restless bandits with average reward0
Show:102550
← PrevPage 221 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified