SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 72017225 of 15113 papers

TitleStatusHype
Stability Constrained Reinforcement Learning for Real-Time Voltage Control0
Modeling Interactions of Autonomous Vehicles and Pedestrians with Deep Multi-Agent Reinforcement Learning for Collision Avoidance0
Reinforcement Learning with Information-Theoretic Actuation0
Unified Data Collection for Visual-Inertial Calibration via Deep Reinforcement LearningCode1
Is Policy Learning Overrated?: Width-Based Planning and Active Learning for AtariCode0
Scalable Online Planning via Reinforcement Learning Fine-TuningCode1
Solving the Real Robot Challenge using Deep Reinforcement LearningCode0
Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators0
Surveillance Evasion Through Bayesian Reinforcement LearningCode0
A Privacy-preserving Distributed Training Framework for Cooperative Multi-agent Deep Reinforcement Learning0
HLIC: Harmonizing Optimization Metrics in Learned Image Compression by Reinforcement Learning0
Bitcoin Transaction Strategy Construction Based on Deep Reinforcement Learning0
Coordinated Reinforcement Learning for Optimizing Mobile Networks0
Generalized Maximum Entropy Reinforcement Learning via Reward Shaping0
CubeTR: Learning to Solve the Rubik's Cube using Transformers0
Policy improvement by planning with GumbelCode2
Revisiting the Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning0
0
Variational oracle guiding for reinforcement learning0
Plan Your Target and Learn Your Skills: State-Only Imitation Learning via Decoupled Policy Optimization0
WaveCorr: Deep Reinforcement Learning with Permutation Invariant Policy Networks for Portfolio Management0
Polyphonic Music Composition: An Adversarial Inverse Reinforcement Learning Approach0
Particle Based Stochastic Policy Optimization0
Value Refinement Network (VRN)0
P4O: Efficient Deep Reinforcement Learning with Predictive Processing Proximal Policy Optimization0
Show:102550
← PrevPage 289 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified