SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 69517000 of 15113 papers

TitleStatusHype
Transfer learning with causal counterfactual reasoning in Decision Transformers0
Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution NetworksCode1
APPTeK: Agent-Based Predicate Prediction in Temporal Knowledge Graphs0
Reinforcement Learning in Factored Action Spaces using Tensor Decompositions0
DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention0
Learning Domain Invariant Representations in Goal-conditioned Block MDPsCode1
Finite Horizon Q-learning: Stability, Convergence, Simulations and an application on Smart Grids0
A Subgame Perfect Equilibrium Reinforcement Learning Approach to Time-inconsistent Problems0
Enhancing Reinforcement Learning with discrete interfaces to learn the Dyck Language0
DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical RepresentationsCode1
Comparing Heuristics, Constraint Optimization, and Reinforcement Learning for an Industrial 2D Packing Problem0
ABIDES-Gym: Gym Environments for Multi-Agent Discrete Event Simulation and Application to Financial MarketsCode1
Fragment-based Sequential Translation for Molecular Optimization0
Multi-Agent Advisor Q-LearningCode0
Towards Hyperparameter-free Policy Selection for Offline Reinforcement LearningCode0
The Difficulty of Passive Learning in Deep Reinforcement Learning0
Fault-Tolerant Federated Reinforcement Learning with Theoretical GuaranteeCode1
Landmark-Guided Subgoal Generation in Hierarchical Reinforcement LearningCode1
Distributed Multi-Agent Deep Reinforcement Learning Framework for Whole-building HVAC Control0
Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey0
Accelerating Distributed Deep Reinforcement Learning by In-Network Experience Sampling0
Learning Robust Controllers Via Probabilistic Model-Based Policy Search0
EnTRPO: Trust Region Policy Optimization Method with Entropy Regularization0
Learning to Simulate Self-Driven Particles System with Coordinated Policy OptimizationCode1
Average-Reward Learning and Planning with Options0
Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective0
Distributional Reinforcement Learning for Multi-Dimensional Reward FunctionsCode0
Automating Control of Overestimation Bias for Reinforcement Learning0
A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments0
Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning0
Operator Shifting for Model-based Policy Evaluation0
Which Model to Trust: Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms for Continuous Control TasksCode0
Mixture-of-Variational-Experts for Continual LearningCode0
Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning0
Recurrent Off-policy Baselines for Memory-based Continuous ControlCode1
Uniformly Conservative Exploration in Reinforcement LearningCode1
Self-Consistent Models and Values0
Goal-Aware Cross-Entropy for Multi-Target Reinforcement LearningCode1
Learning What to Memorize: Using Intrinsic Motivation to Form Useful Memory in Partially Observable Reinforcement Learning0
Can Q-Learning be Improved with Advice?0
Deep Reinforcement Learning for Simultaneous Sensing and Channel Access in Cognitive Networks0
Understanding the World Through ActionCode1
False Correlation Reduction for Offline Reinforcement LearningCode1
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
Fully Distributed Actor-Critic Architecture for Multitask Deep Reinforcement Learning0
Foresight of Graph Reinforcement Learning Latent Permutations Learnt by Gumbel Sinkhorn Network0
Policy Search using Dynamic Mirror Descent MPC for Model Free Off Policy RL0
Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction0
A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow0
Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming0
Show:102550
← PrevPage 140 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified