SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 36013625 of 15113 papers

TitleStatusHype
User Retention-oriented Recommendation with Decision TransformerCode1
Provably Efficient Model-Free Algorithms for Non-stationary CMDPs0
Understanding the Synergies between Quality-Diversity and Deep Reinforcement Learning0
Optimal foraging strategies can be learnedCode0
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-TuningCode1
Evolving Populations of Diverse RL Agents with MAP-Elites0
Exploiting Contextual Structure to Generate Useful Auxiliary Tasks0
Task Aware Dreamer for Task Generalization in Reinforcement Learning0
Power and Interference Control for VLC-Based UDN: A Reinforcement Learning Approach0
A Framework for History-Aware Hyperparameter Optimisation in Reinforcement Learning0
Variance-aware robust reinforcement learning with linear function approximation under heavy-tailed rewards0
Conceptual Reinforcement Learning for Language-Conditioned Tasks0
Computably Continuous Reinforcement-Learning Objectives are PAC-learnable0
GOATS: Goal Sampling Adaptation for Scooping with Curriculum Reinforcement Learning0
Real-time scheduling of renewable power systems through planning-based reinforcement learning0
Beware of Instantaneous Dependence in Reinforcement Learning0
Recent Advances of Deep Robotic Affordance Learning: A Reinforcement Learning Perspective0
MCTS-GEB: Monte Carlo Tree Search is a Good E-graph BuilderCode0
Using Memory-Based Learning to Solve Tasks with State-Action Constraints0
RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement LearningCode0
Deep Occupancy-Predictive Representations for Autonomous Driving0
Learning Bipedal Walking for Humanoids with Current FeedbackCode3
A Multiplicative Value Function for Safe and Efficient Reinforcement LearningCode1
Evolutionary Reinforcement Learning: A Survey0
Learning When to Treat Business Processes: Prescriptive Process Monitoring with Causal Inference and Reinforcement LearningCode0
Show:102550
← PrevPage 145 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified