SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 38513875 of 15113 papers

TitleStatusHype
ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning0
Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble0
Assume-Guarantee Reinforcement Learning0
Distributed Learning on Heterogeneous Resource-Constrained Devices0
Correct-by-synthesis reinforcement learning with temporal logic constraints0
Associative Memory Based Experience Replay for Deep Reinforcement Learning0
A Generative Framework for Simultaneous Machine Translation0
Distributed Multi-Agent Deep Reinforcement Learning Framework for Whole-building HVAC Control0
Deep RL-based Trajectory Planning for AoI Minimization in UAV-assisted IoT0
Deep RL for Blood Glucose Control: Lessons, Challenges, and Opportunities0
Deep RL with Hierarchical Action Exploration for Dialogue Generation0
Deep RL With Information Constrained Policies: Generalization in Continuous Control0
Distributed Multi-Agent Deep Reinforcement Learning for Robust Coordination against Noise0
A General Theory of Relativity in Reinforcement Learning0
DeepScalper: A Risk-Aware Reinforcement Learning Framework to Capture Fleeting Intraday Trading Opportunities0
CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design0
CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models0
Deep Sets for Generalization in RL0
A Multi-Agent Deep Reinforcement Learning Approach for a Distributed Energy Marketplace in Smart Grids0
Deep SIMBAD: Active Landmark-based Self-localization Using Ranking -based Scene Descriptor0
ACTRCE: Augmenting Experience via Teacher’s Advice0
BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs0
BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT0
Deep Surrogate Assisted Generation of Environments0
CoRAL: Collaborative Retrieval-Augmented Large Language Models Improve Long-tail Recommendation0
Show:102550
← PrevPage 155 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified