SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1300113050 of 15113 papers

TitleStatusHype
POPO: Pessimistic Offline Policy OptimizationCode0
Off-Policy Deep Reinforcement Learning with Analogous Disentangled ExplorationCode0
The Dreaming Variational Autoencoder for Reinforcement Learning EnvironmentsCode0
Modular Deep Reinforcement Learning with Temporal Logic SpecificationsCode0
Off-Policy Correction For Multi-Agent Reinforcement LearningCode0
Meta-Reinforcement Learning by Tracking Task Non-stationarityCode0
MCTS-GEB: Monte Carlo Tree Search is a Good E-graph BuilderCode0
Pontryagin Optimal Control via Neural NetworksCode0
The Effects of Memory Replay in Reinforcement LearningCode0
Meta reinforcement learning as task inferenceCode0
Risk-sensitive control as inference with Rényi divergenceCode0
POMDP inference and robust solution via deep reinforcement learning: An application to railway optimal maintenanceCode0
Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement LearningCode0
Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric MethodsCode0
MazeBase: A Sandbox for Learning from GamesCode0
Policy Search with Rare Significant Events: Choosing the Right Partner to Cooperate withCode0
Reinforcement Learning in a Physics-Inspired Semi-Markov EnvironmentCode0
Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction ApproachCode0
Policy Poisoning in Batch Reinforcement Learning and ControlCode0
Learning Socially Appropriate Robot Approaching Behavior Toward Groups using Deep Reinforcement LearningCode0
Tracking Object Positions in Reinforcement Learning: A Metric for Keypoint Detection (extended version)Code0
Policy Mirror Descent with LookaheadCode0
Markov Abstractions for PAC Reinforcement Learning in Non-Markov Decision ProcessesCode0
Social learning spontaneously emerges by searching optimal heuristics with deep reinforcement learningCode0
Socially Aware Motion Planning with Deep Reinforcement LearningCode0
RL^2: Fast Reinforcement Learning via Slow Reinforcement LearningCode0
Universally Expressive Communication in Multi-Agent Reinforcement LearningCode0
Socially Intelligent Genetic Agents for the Emergence of Explicit NormsCode0
Reinforcement Learning Guided Multi-Objective Exam Paper GenerationCode0
Policy Learning Using Weak SupervisionCode0
Universal Policies to Learn Them AllCode0
Reinforcement Learning Guided by Provable Normative ComplianceCode0
Reinforcement Learning Generalization with Surprise MinimizationCode0
Tractable Reinforcement Learning of Signal Temporal Logic ObjectivesCode0
Universal Reinforcement Learning Algorithms: Survey and ExperimentsCode0
RL and Fingerprinting to Select Moving Target Defense Mechanisms for Zero-day Attacks in IoTCode0
Muscle Excitation Estimation in Biomechanical Simulation Using NAF Reinforcement LearningCode0
Policy Learning for Malaria ControlCode0
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement LearningCode0
Soft Actor-Critic for Discrete Action SettingsCode0
Policy-GNN: Aggregation Optimization for Graph Neural NetworksCode0
Soft Actor-Critic with Cross-Entropy Policy OptimizationCode0
Combining Reconstruction and Contrastive Methods for Multimodal Representations in RLCode0
Universal Successor Features ApproximatorsCode0
Market Making via Reinforcement LearningCode0
RLCard: A Toolkit for Reinforcement Learning in Card GamesCode0
Lucid Dreaming for Experience Replay: Refreshing Past States with the Current PolicyCode0
Multi-View Reinforcement LearningCode0
Policy DistillationCode0
Policy Continuation with Hindsight Inverse DynamicsCode0
Show:102550
← PrevPage 261 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified