SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1450114550 of 15113 papers

TitleStatusHype
Stochastic Constraint Programming as Reinforcement Learning0
Modular Multi-Objective Deep Reinforcement Learning with Decision ValuesCode0
Equivalence Between Policy Gradients and Soft Q-Learning0
A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units0
Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads0
Investigating Recurrence and Eligibility Traces in Deep Q-Networks0
Beating Atari with Natural Language Guided Reinforcement LearningCode0
Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention0
Pseudorehearsal in actor-critic agents0
Task-Oriented Query Reformulation with Reinforcement LearningCode0
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning0
MUSE: Modularizing Unsupervised Sense EmbeddingsCode0
Ultrafast photonic reinforcement learning based on laser chaos0
Optimizing Differentiable Relaxations of Coreference Evaluation MetricsCode0
Environment-Independent Task Specifications via GLTL0
Virtual to Real Reinforcement Learning for Autonomous DrivingCode0
Deep Reinforcement Learning-based Image Captioning with Embedding Reward0
Deep Q-learning from DemonstrationsCode0
Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning0
Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning0
Data-efficient Deep Reinforcement Learning for Dexterous Manipulation0
Stochastic Neural Networks for Hierarchical Reinforcement LearningCode0
Deep Reinforcement Learning framework for Autonomous DrivingCode0
Stein Variational Policy Gradient0
Finite Sample Analyses for TD(0) with Function Approximation0
Multi-Advisor Reinforcement Learning0
On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning0
Evaluating Persuasion Strategies and Deep Reinforcement Learning methods for Negotiation Dialogue agents0
Integrated Learning of Dialog Strategies and Semantic Parsing0
Learning Visual Servoing with Deep Features and Fitted Q-IterationCode0
Sentence Simplification with Deep Reinforcement LearningCode0
Enter the Matrix: Safely Interruptible Autonomous Systems via Virtualization0
Dynamic Computational Time for Visual AttentionCode0
Inverse Risk-Sensitive Reinforcement Learning0
Inverse Reinforcement Learning from Summary Data0
Socially Aware Motion Planning with Deep Reinforcement LearningCode0
Exploration--Exploitation in MDPs with Options0
Cohesion-based Online Actor-Critic Reinforcement Learning for mHealth Intervention0
Unsupervised Basis Function Adaptation for Reinforcement Learning0
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement LearningCode0
Fake News Mitigation via Point Process Based Intervention0
Deep Exploration via Randomized Value Functions0
Faster Reinforcement Learning Using Active SimulatorsCode0
Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems0
Black-Box Data-efficient Policy Search for RoboticsCode0
Pseudorehearsal in value function approximation0
Multi-Timescale, Gradient Descent, Temporal Difference Learning with Linear Options0
Online Learning for Offloading and Autoscaling in Energy Harvesting Mobile Edge Computing0
Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability0
Minimax Regret Bounds for Reinforcement LearningCode0
Show:102550
← PrevPage 291 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified