SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 95019550 of 15113 papers

TitleStatusHype
On the role of planning in model-based deep reinforcement learning0
Reliable Off-policy Evaluation for Reinforcement Learning0
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient0
Exploring market power using deep reinforcement learning for intelligent bidding strategies0
Drafting in Collectible Card Games via Reinforcement LearningCode1
A Reinforcement Learning Approach to the Orienteering Problem with Time WindowsCode1
Universal Activation Function For Machine Learning0
Single and Multi-Agent Deep Reinforcement Learning for AI-Enabled Wireless Networks: A Tutorial0
Sample-efficient Reinforcement Learning in Robotic Table Tennis0
Motion Prediction on Self-driving Cars: A Review0
The Value Equivalence Principle for Model-Based Reinforcement Learning0
Playing optical tweezers with deep reinforcement learning: in virtual, physical and augmented environments0
RealAnt: An Open-Source Low-Cost Quadruped for Education and Research in Real-World Reinforcement LearningCode1
A Hysteretic Q-learning Coordination Framework for Emerging Mobility Systems in Smart Cities0
LBGP: Learning Based Goal Planning for Autonomous Following in Front0
Learning a Decentralized Multi-arm Motion PlannerCode1
Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping0
XCSF for Automatic Test Case PrioritizationCode0
Optimal Control-Based Baseline for Guided Exploration in Policy Gradient Methods0
Learning Trajectories for Visual-Inertial System Calibration via Model-based Heuristic Deep Reinforcement LearningCode1
Generative Inverse Deep Reinforcement Learning for Online Recommendation0
Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks0
Differentiable Physics Models for Real-world Offline Model-based Reinforcement Learning0
Deep Reinforcement Learning Based Dynamic Route Planning for Minimizing Travel Time0
Control with adaptive Q-learningCode0
Distributional Reinforcement Learning for mmWave Communications with Intelligent Reflectors on a UAV0
Online Observer-Based Inverse Reinforcement Learning0
Generalization to New Actions in Reinforcement LearningCode1
Self-Driving Network and Service Coordination Using Deep Reinforcement LearningCode1
Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models0
Sample-efficient reinforcement learning using deep Gaussian processes0
Exact Asymptotics for Linear Quadratic Adaptive ControlCode0
Incorporating Rivalry in Reinforcement Learning for a Competitive Game0
Depth Self-Optimized Learning Toward Data ScienceCode0
Information-theoretic Task Selection for Meta-Reinforcement Learning0
Fast Reinforcement Learning with Incremental Gaussian Mixture Models0
Cooperative Heterogeneous Deep Reinforcement Learning0
Instance based Generalization in Reinforcement LearningCode0
Causal Campbell-Goodhart's law and Reinforcement LearningCode0
Interpreting Graph Drawing with Multi-Agent Reinforcement Learning0
A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting0
NEARL: Non-Explicit Action Reinforcement Learning for Robotic Control0
Reinforcement Learning of Structured Control for Linear Systems with Unknown State Matrix0
Multi-Agent Reinforcement Learning for Visibility-based Persistent MonitoringCode0
Reinforcement Learning with Efficient Active Feature Acquisition0
Production-based Cognitive Models as a Test Suite for Reinforcement Learning Algorithms0
Reinforcement Learning with Imbalanced Dataset for Data-to-Text Medical Report Generation0
Guided Dialogue Policy Learning without Adversarial Learning in the LoopCode0
Few-Shot Multi-Hop Relation Reasoning over Knowledge Bases0
Task-Completion Dialogue Policy Learning via Monte Carlo Tree Search with Dueling Network0
Show:102550
← PrevPage 191 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified