SOTAVerified

Distributional Reinforcement Learning

Value distribution is the distribution of the random return received by a reinforcement learning agent. it been used for a specific purpose such as implementing risk-aware behaviour.

We have random return Z whose expectation is the value Q. This random return is also described by a recursive equation, but one of a distributional nature

Papers

Showing 5175 of 137 papers

TitleStatusHype
Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion0
Distributional Reinforcement Learning with Online Risk-awareness Adaption0
Estimation and Inference in Distributional Reinforcement LearningCode0
Learning Risk-Aware Quadrupedal Locomotion using Distributional Reinforcement Learning0
Deep Reinforcement Learning for Artificial Upwelling Energy Management0
Value-Distributional Model-Based Reinforcement LearningCode0
Variance Control for Distributional Reinforcement LearningCode0
Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent0
Distributional Model Equivalence for Risk-Sensitive Reinforcement LearningCode0
Is Risk-Sensitive Reinforcement Learning Properly Resolved?0
Diverse Projection Ensembles for Distributional Reinforcement Learning0
PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm0
Improving the generalizability and robustness of large-scale traffic signal control0
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation0
Distributional Reinforcement Learning with Dual Expectile-Quantile Regression0
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement LearningCode0
One-Step Distributional Reinforcement Learning0
Policy Evaluation in Distributional LQR0
Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent Reinforcement Learning0
Constrained Reinforcement Learning using Distributional Representation for Trustworthy Quadrotor UAV Tracking ControlCode0
Distributional constrained reinforcement learning for supply chain optimizationCode0
Multi-compartment Neuron and Population Encoding Powered Spiking Neural Network for Deep Distributional Reinforcement Learning0
An Analysis of Quantile Temporal-Difference Learning0
Invariance to Quantile Selection in Distributional Continuous Control0
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds0
Show:102550
← PrevPage 3 of 6Next →

No leaderboard results yet.