SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1135111400 of 15113 papers

TitleStatusHype
Fast Reinforcement Learning for Anti-jamming Communications0
MODRL/D-AM: Multiobjective Deep Reinforcement Learning Algorithm Using Decomposition and Attention Model for Multiobjective Optimization0
Multi-Vehicle Routing Problems with Soft Time Windows: A Multi-Agent Reinforcement Learning Approach0
Regret Bounds for Discounted MDPs0
On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement LearningCode0
A Tensor Network Approach to Finite Markov Decision Processes0
Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing0
HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem0
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning0
Learning Structured Communication for Multi-agent Reinforcement Learning0
Learning to Switch Among Agents in a Team via 2-Layer Markov Decision Processes0
Machine Learning Approaches For Motor Learning: A Short Review0
Towards Intelligent Pick and Place Assembly of Individualized Products Using Reinforcement Learning0
Provable Self-Play Algorithms for Competitive Reinforcement Learning0
On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning0
On Reward Shaping for Mobile Robot Navigation: A Reinforcement Learning and SLAM Based Approach0
Proficiency Constrained Multi-Agent Reinforcement Learning for Environment-Adaptive Multi UAV-UGV Teaming0
Discrete Action On-Policy Learning with Action-Value CriticCode0
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions0
Reward Tweaking: Maximizing the Total Reward While Planning for Short Horizons0
RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning0
Multi-task Reinforcement Learning with a Planning Quasi-Metric0
BRPO: Batch Residual Policy Optimization0
Inferential Induction: A Novel Framework for Bayesian Reinforcement Learning0
Conservative Exploration in Reinforcement Learning0
Learning State Abstractions for Transfer in Continuous ControlCode0
A data-driven choice of misfit function for FWI using reinforcement learning0
Analyzing Policy Distillation on Multi-Task Learning and Meta-Reinforcement Learning in Meta-World0
Description Based Text Classification with Reinforcement Learning0
Causally Correct Partial Models for Reinforcement Learning0
Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation0
Accelerating Reinforcement Learning for Reaching using Continuous Curriculum Learning0
Automated Lane Change Strategy using Proximal Policy Optimization-based Deep Reinforcement Learning0
Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts0
Representation of Reinforcement Learning Policies in Reproducing Kernel Hilbert Spaces0
Manipulating Reinforcement Learning: Poisoning Attacks on Cost Signals0
Reward-Free Exploration for Reinforcement Learning0
Student/Teacher Advising through Reward Augmentation0
Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting0
Temporal-adaptive Hierarchical Reinforcement Learning0
Social diversity and social preferences in mixed-motive reinforcement learning0
Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning0
Deep Radial-Basis Value Functions for Continuous Control0
Learning Task-Driven Control Policies via Information Bottlenecks0
Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise0
Bootstrapping a DQN Replay Memory with Synthetic Experiences0
Policy Gradient based Quantum Approximate Optimization Algorithm0
Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes0
Evolutionary algorithms for constructing an ensemble of decision trees0
Deep Reinforcement Learning for Autonomous Driving: A Survey0
Show:102550
← PrevPage 228 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified