SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1205112100 of 15113 papers

TitleStatusHype
Multiagent Reinforcement Learning in Games with an Iterated Dominance Solution0
QXplore: Q-Learning Exploration by Maximizing Temporal Difference Error0
Policy Optimization by Local Improvement through Search0
Reinforcement learning for suppression of collective activity in oscillatory ensembles0
Training a Constrained Natural Media Painting Agent using Reinforcement Learning0
Temporal Difference Weighted Ensemble For Reinforcement Learning0
Model Ensemble-Based Intrinsic Reward for Sparse Reward Reinforcement Learning0
Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning0
Model-free Learning Control of Nonlinear Stochastic Systems with Stability Guarantee0
Self-Supervised State-Control through Intrinsic Mutual Information RewardsCode0
Probabilistic View of Multi-agent Reinforcement Learning: A Unified Approach0
Meta Learning via Learned Loss0
Sequence-level Intrinsic Exploration Model for Partially Observable Domains0
Model Imitation for Model-Based Reinforcement Learning0
Modeling Fake News in Social Networks with Deep Multi-Agent Reinforcement Learning0
Zero-Shot Policy Transfer with Disentangled Attention0
REFINING MONTE CARLO TREE SEARCH AGENTS BY MONTE CARLO TREE SEARCH0
Subjective Reinforcement Learning for Open Complex Environments0
ROBEL: Robotics Benchmarks for Learning with Low-Cost RobotsCode0
Variational Constrained Reinforcement Learning with Application to Planning at Roundabout0
S2VG: Soft Stochastic Value Gradient method0
MoET: Interpretable and Verifiable Reinforcement Learning via Mixture of Expert Trees0
Policy Tree Network0
Solving single-objective tasks by preference multi-objective reinforcement learning0
Partial Simulation for Imitation Learning0
Multi-Agent Hierarchical Reinforcement Learning for Humanoid Navigation0
Striving for Simplicity in Off-Policy Deep Reinforcement Learning0
Reinforcement Learning with Chromatic Networks0
Sparse Skill Coding: Learning Behavioral Hierarchies with Sparse Codes0
Mint: Matrix-Interleaving for Multi-Task Learning0
Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients0
Robust Domain Randomization for Reinforcement Learning0
Pre-training as Batch Meta Reinforcement Learning with tiMe0
Paying Attention to Function WordsCode0
Power Allocation in Cache-Aided NOMA Systems: Optimization and Deep Reinforcement Learning Approaches0
Controlling an Autonomous Vehicle with Deep Reinforcement Learning0
Avoidance Learning Using Observational Reinforcement Learning0
Accept Synthetic Objects as Real: End-to-End Training of Attentive Deep Visuomotor Policies for Manipulation in ClutterCode0
Efficient Inference and Exploration for Reinforcement Learning0
Active inference: demystified and comparedCode0
Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement LearningCode0
Brain-Inspired Hardware for Artificial Intelligence: Accelerated Learning in a Physical-Model Spiking Neural Network0
PAC Reinforcement Learning without Real-World Feedback0
Constrained Attractor Selection Using Deep Reinforcement Learning0
Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization0
Where to Look Next: Unsupervised Active Visual Exploration on 360° Input0
Robot Navigation in Crowds by Graph Convolutional Networks with Attention Learned from Human Gaze0
Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement LearningCode0
Modular Deep Reinforcement Learning with Temporal Logic SpecificationsCode0
Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?0
Show:102550
← PrevPage 242 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified