SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 82018225 of 15113 papers

TitleStatusHype
Non-Cooperative Inverse Reinforcement Learning0
Non-Crossing Quantile Regression for Distributional Reinforcement Learning0
Non-decreasing Quantile Function Network with Efficient Exploration for Distributional Reinforcement Learning0
Non Deterministic Logic Programs0
Non-Deterministic Policies in Markovian Decision Processes0
Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning0
Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling0
Non-local Optimization: Imposing Structure on Optimization Problems by Relaxation0
Non-local Policy Optimization via Diversity-regularized Collaborative Exploration0
Non-Markovian policies occupancy measures0
Non-Markovian Reinforcement Learning using Fractional Dynamics0
NQMIX: Non-monotonic Value Function Factorization for Deep Multi-Agent Reinforcement Learning0
Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions0
Nonparametric Bayesian Policy Priors for Reinforcement Learning0
Nonparametric Bellman Mappings for Reinforcement Learning: Application to Robust Adaptive Filtering0
Nonparametric General Reinforcement Learning0
Nonprehensile Planar Manipulation through Reinforcement Learning with Multimodal Categorical Exploration0
Non-Robust Feature Mapping in Deep Reinforcement Learning0
Non-stationary Reinforcement Learning under General Function Approximation0
Nonstationary Reinforcement Learning with Linear Function Approximation0
Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach0
Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design0
Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning0
No-Press Diplomacy: Modeling Multi-Agent Gameplay0
No-Regret Exploration in Goal-Oriented Reinforcement Learning0
Show:102550
← PrevPage 329 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified