SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 70267050 of 15113 papers

TitleStatusHype
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization0
Contrastive Active InferenceCode1
CORA: Benchmarks, Baselines, and Metrics as a Platform for Continual Reinforcement Learning AgentsCode1
Balancing Value Underestimation and Overestimation with Realistic Actor-CriticCode0
Continuous Control with Action Quantization from Demonstrations0
Aesthetic Photo Collage with Deep Reinforcement Learning0
Learning Robotic Manipulation Skills Using an Adaptive Force-Impedance Action Space0
Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes0
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game0
Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm0
Offline Reinforcement Learning with Value-based Episodic MemoryCode1
State-based Episodic Memory for Multi-Agent Reinforcement Learning0
RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender SystemCode1
Embracing advanced AI/ML to help investors achieve success: Vanguard Reinforcement Learning for Financial Goal Planning0
An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agentsCode1
Improving Robustness of Reinforcement Learning for Power System Control with Adversarial Training0
Edge Rewiring Goes Neural: Boosting Network Resilience without Rich FeaturesCode1
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs0
No RL, No Simulation: Learning to Navigate without NavigatingCode1
Option Transfer and SMDP Abstraction with Successor Features0
Sim-to-Real Transfer in Multi-agent Reinforcement Networking for Federated Edge Computing0
Reinforcement Learning-Based Coverage Path Planning with Implicit Cellular Decomposition0
Provable Hierarchy-Based Meta-Reinforcement Learning0
Accelerating lifelong reinforcement learning via reshaping rewardsCode1
Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization0
Show:102550
← PrevPage 282 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified