SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1405114100 of 15113 papers

TitleStatusHype
Avoiding Catastrophic States with Intrinsic Fear0
Deep Reinforcement Learning for List-wise RecommendationsCode1
Learning Structural Weight Uncertainty for Sequential Decision-MakingCode0
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness RewardCode0
SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation0
Reinforcement Learning with Analogical Similarity to Guide Schema Induction and Attention0
Multi-timescale memory dynamics in a reinforcement learning network with attention-gated memoryCode0
Consensus-based Sequence Training for Video Captioning0
RLlib: Abstractions for Distributed Reinforcement LearningCode4
Whatever Does Not Kill Deep Reinforcement Learning, Makes It StrongerCode1
A short variational proof of equivalence between policy gradients and soft Q learning0
Federated Control with Hierarchical Multi-Agent Deep Reinforcement LearningCode0
Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator0
Multiagent-based Participatory Urban Simulation through Inverse Reinforcement Learning0
Revisiting the Master-Slave Architecture in Multi-Agent Deep Reinforcement Learning0
Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition0
Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning0
Pseudorehearsal in actor-critic agents with neural network function approximation0
Two-dimensional Anti-jamming Mobile Communication Based on Reinforcement Learning0
On Wasserstein Reinforcement Learning and the Fokker-Planck equation0
On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent0
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement LearningCode0
ES Is More Than Just a Traditional Finite-Difference Approximator0
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking AgentsCode0
Integral Equations and Machine Learning0
Towards a Deep Reinforcement Learning Approach for Tower Line Wars0
Ray: A Distributed Framework for Emerging AI ApplicationsCode4
Occam's razor is insufficient to infer the preferences of irrational agents0
Hierarchical Text Generation and Planning for Strategic DialogueCode0
AI2-THOR: An Interactive 3D Environment for Visual AICode1
Differentiable lower bound for expected BLEU scoreCode0
Inverse Reinforcement Learning for Marketing0
Multi-focus Attention Network for Efficient Deep Reinforcement Learning0
QLBS: Q-Learner in the Black-Scholes(-Merton) WorldsCode0
Deep Reinforcement Learning Boosted by External Knowledge0
A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning AgentsCode0
Interpretable Policies for Reinforcement Learning by Genetic Programming0
Simulated Autonomous Driving on Realistic Road Networks using Deep Reinforcement Learning0
Robust Deep Reinforcement Learning with Adversarial Attacks0
MINOS: Multimodal Indoor Simulator for Navigation in Complex EnvironmentsCode0
The Eigenoption-Critic Framework0
Reinforced dynamics for enhanced sampling in large atomic and molecular systems0
Stochastic Answer Networks for Machine Reading ComprehensionCode0
Deep Primal-Dual Reinforcement Learning: Accelerating Actor-Critic using Bellman Duality0
End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient0
Noisy Natural Gradient as Variational InferenceCode0
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning AlgorithmCode1
A Deeper Look at Experience ReplayCode0
Interactive Reinforcement Learning for Object Grounding via Self-Talking0
Representation and Reinforcement Learning for Personalized Glycemic Control in Septic Patients0
Show:102550
← PrevPage 282 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified