SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1390113950 of 15113 papers

TitleStatusHype
Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and ApplicationCode0
Towards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach0
Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling0
Deep Reinforcement Learning for Sponsored Search Real-time Bidding0
On Oracle-Efficient PAC RL with Rich Observations0
Hierarchical Imitation and Reinforcement Learning0
Learning by Playing - Solving Sparse Reward Tasks from ScratchCode0
Model-Ensemble Trust-Region Policy OptimizationCode0
Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning0
Deep Reinforcement Learning for Join Order Enumeration0
Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy MethodsCode0
DiGrad: Multi-Task Reinforcement Learning with Shared Actions0
Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising0
The Mirage of Action-Dependent Baselines in Reinforcement LearningCode0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for ResearchCode0
Modeling Others using Oneself in Multi-Agent Reinforcement Learning0
Variance Reduction Methods for Sublinear Reinforcement Learning0
Reinforcement and Imitation Learning for Diverse Visuomotor SkillsCode0
Addressing Function Approximation Error in Actor-Critic MethodsCode1
Temporal Difference Models: Model-Free Deep RL for Model-Based Control0
Reinforcement Learning on Web Interfaces Using Workflow-Guided ExplorationCode1
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing AtariCode0
Fully Decentralized Multi-Agent Reinforcement Learning with Networked AgentsCode1
Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising0
Verifying Controllers Against Adversarial Examples with Bayesian OptimizationCode0
Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments0
Ranking Sentences for Extractive Summarization with Reinforcement LearningCode0
Structured Control Nets for Deep Reinforcement LearningCode0
An Analysis of Categorical Distributional Reinforcement Learning0
Diverse Exploration for Fast and Safe Policy Improvement0
Variational Inference for Policy Gradient0
Meta-Reinforcement Learning of Structured Exploration StrategiesCode1
Continual Reinforcement Learning with Complex Synapses0
Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning0
Fourier Policy Gradients0
Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning0
Improving Mild Cognitive Impairment Prediction via Reinforcement Learning and Dialogue Simulation0
Estimating scale-invariant future in continuous time0
Efficient Collaborative Multi-Agent Deep Reinforcement Learning for Large-Scale Fleet ManagementCode0
Bridging Cognitive Programs and Machine Learning0
Modeling the Formation of Social Conventions from Embodied Real-Time Interactions0
Reactive Reinforcement Learning in Asynchronous Environments0
Monte Carlo Q-learning for General Game PlayingCode0
Diversity is All You Need: Learning Skills without a Reward FunctionCode1
Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays0
Mean Field Multi-Agent Reinforcement LearningCode1
Reinforcement Learning from Imperfect Demonstrations0
From Gameplay to Symbolic Reasoning: Learning SAT Solver Heuristics in the Style of Alpha(Go) ZeroCode0
GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning AlgorithmsCode0
Show:102550
← PrevPage 279 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified