SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1410114150 of 15113 papers

TitleStatusHype
MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective IntelligenceCode0
Online Reinforcement Learning in Stochastic Games0
Progressive Neural Architecture SearchCode0
Natural Value Approximators: Learning when to Trust Past Estimates0
Log-normality and Skewness of Estimated State/Action Values in Reinforcement Learning0
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds0
Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes0
Adaptive Batch Size for Safe Policy Gradients0
Data-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs0
Dynamic-Depth Context Tree Weighting0
Compatible Reward Inverse Reinforcement Learning0
Time Limits in Reinforcement LearningCode1
Safe Exploration for Identifying Linear Systems via Robust Optimization0
Transferring Autonomous Driving Knowledge on Simulated and Real Intersections0
Embodied Question AnsweringCode0
Improved Learning in Evolution Strategies via Sparser Inter-Agent Network Topologies0
Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control0
Can Complex Collective Behaviour Be Generated Through Randomness, Memory and a Pinch of Luck?0
HoME: a Household Multimodal Environment0
End-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning0
Automating Vehicles by Deep Reinforcement Learning using Task Separation with Hill Climbing0
Reinforcement Learning To Adapt Speech Enhancement to Instantaneous Input Signal Quality0
Video Captioning via Hierarchical Reinforcement Learning0
A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management0
Deep Reinforcement Learning for De-Novo Drug DesignCode0
Hierarchical Policy Search via Return-Weighted Density Estimation0
One-Shot Reinforcement Learning for Robot Navigation with Interactive ReplayCode1
Plan, Attend, Generate: Planning for Sequence-to-Sequence ModelsCode1
Learning from Longitudinal Face Demonstration - Where Tractable Deep Modeling Meets Inverse Reinforcement Learning0
Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric MethodsCode0
A reinforcement learning algorithm for building collaboration in multi-agent systems0
Crossmodal Attentive Skill LearnerCode0
Deep Reinforcement Learning for Sepsis TreatmentCode0
AI Safety GridworldsCode0
Divide-and-Conquer Reinforcement LearningCode0
Generative Adversarial Network for Abstractive Text SummarizationCode0
Malaria Likelihood Prediction By Effectively Surveying Households Using Deep Reinforcement Learning0
Ethical Challenges in Data-Driven Dialogue SystemsCode0
Cascade Attribute Learning Network0
Action Branching Architectures for Deep Reinforcement LearningCode1
Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards0
Transferring Agent Behaviors from Videos via Motion GANs0
Posterior Sampling for Large Scale Reinforcement Learning0
Teaching a Machine to Read Maps with Deep Reinforcement LearningCode0
Classification with Costly Features using Deep Reinforcement LearningCode0
Deep Reinforcement Learning for Multi-Resource Multi-Machine Job Scheduling0
Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement LearningCode0
Run, skeleton, run: skeletal model in a physics-based simulationCode0
Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction0
Hindsight policy gradientsCode0
Show:102550
← PrevPage 283 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified