SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1330113350 of 15113 papers

TitleStatusHype
Actor-Attention-Critic for Multi-Agent Reinforcement LearningCode1
MyCaffe: A Complete C# Re-Write of Caffe with Reinforcement LearningCode0
Deep Reinforcement Learning for Time Scheduling in RF-Powered Backscatter Cognitive Radio Networks0
Learning Scheduling Algorithms for Data Processing ClustersCode0
Comparison of Reinforcement Learning algorithms applied to the Cart Pole problemCode0
Efficient Dialog Policy Learning via Positive Memory RetentionCode0
Energy-Based Hindsight Experience PrioritizationCode0
EMI: Exploration with Mutual InformationCode0
Near-Optimal Representation Learning for Hierarchical Reinforcement LearningCode0
The Dreaming Variational Autoencoder for Reinforcement Learning EnvironmentsCode0
Reinforcement Learning with Perturbed RewardsCode0
Autonomous Sub-domain Modeling for Dialogue Policy with Hierarchical Deep Reinforcement Learning0
Curriculum Learning Based on Reward Sparseness for Deep Reinforcement Learning of Task Completion Dialogue Management0
Automatic Poetry Generation with Mutual Reinforcement Learning0
Automatic Essay Scoring Incorporating Rating Schema via Reinforcement Learning0
A Teacher-Student Framework for Maintainable Dialog Manager0
Adaptive Multi-pass Decoder for Neural Machine Translation0
Logician and Orator: Learning from the Duality between Language and Knowledge in Open Domain0
Prediction Improves Simultaneous Neural Machine Translation0
SmartChoices: Hybridizing Programming and Machine Learning0
Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement LearningCode0
Learning to Perform Local Rewriting for Combinatorial OptimizationCode0
Deep Quality-Value (DQV) LearningCode0
Bayesian Transfer Reinforcement Learning with Prior Knowledge Rules0
Few-Shot Goal Inference for Visuomotor Learning and Planning0
Generalization and Regularization in DQNCode0
Reinforcement Learning in R0
M^3RL: Mind-aware Multi-agent Management Reinforcement LearningCode0
Direct optimization of F-measure for retrieval-based personal question answering0
Robot Representation and Reasoning with Knowledge from Reinforcement Learning0
Policy Generalization In Capacity-Limited Reinforcement Learning0
Where Off-Policy Deep Reinforcement Learning Fails0
Successor Options : An Option Discovery Algorithm for Reinforcement Learning0
What Would pi* Do?: Imitation Learning via Off-Policy Reinforcement Learning0
Shrinkage-based Bias-Variance Trade-off for Deep Reinforcement Learning0
Transfer Value or Policy? A Value-centric Framework Towards Transferrable Continuous Reinforcement Learning0
Mimicking actions is a good strategy for beginners: Fast Reinforcement Learning with Expert Action Sequences0
Learning Physics Priors for Deep Reinforcement Learing0
Exploration by Uncertainty in Reward Space0
COLLABORATIVE MULTIAGENT REINFORCEMENT LEARNING IN HOMOGENEOUS SWARMS0
A Better Baseline for Second Order Gradient Estimation in Stochastic Computation Graphs0
Countering Language Drift via Grounding0
Deep Reinforcement Learning of Universal Policies with Diverse Environment Summaries0
Guided Exploration in Deep Reinforcement Learning0
DEEP ADVERSARIAL FORWARD MODEL0
Distilled Agent DQN for Provable Adversarial Robustness0
Constraining Action Sequences with Formal Languages for Deep Reinforcement Learning0
Hybrid Policies Using Inverse Rewards for Reinforcement Learning0
Accelerated Value Iteration via Anderson Mixing0
Convergent Reinforcement Learning with Function Approximation: A Bilevel Optimization Perspective0
Show:102550
← PrevPage 267 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified