SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 64766500 of 15113 papers

TitleStatusHype
A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning0
Execute Order 66: Targeted Data Poisoning for Reinforcement Learning0
Actor-Critic Network for Q&A in an Adversarial Environment0
Hybrid intelligence for dynamic job-shop scheduling with deep reinforcement learning and attention mechanismCode1
Robust Algorithmic Collusion0
Toward Causal-Aware RL: State-Wise Action-Refined Temporal DifferenceCode0
Reinforcement Learning for Task Specifications with Action-Constraints0
Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification0
Symmetry-Aware Neural Architecture for Embodied Visual Exploration0
Joint Learning-Based Stabilization of Multiple Unknown Linear Systems0
A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning0
Toward Pareto Efficient Fairness-Utility Trade-off inRecommendation through Reinforcement Learning0
Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning0
Transfer RL across Observation Feature Spaces via Model-Based Regularization0
Stochastic convex optimization for provably efficient apprenticeship learning0
Using Graph-Aware Reinforcement Learning to Identify Winning Strategies in Diplomacy Games (Student Abstract)0
Single-Shot Pruning for Offline Reinforcement Learning0
Robust Entropy-regularized Markov Decision Processes0
SimSR: Simple Distance-based State Representation for Deep Reinforcement LearningCode1
A Theoretical Understanding of Gradient Bias in Meta-Reinforcement LearningCode0
Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning0
Stability-Preserving Automatic Tuning of PID Control with Reinforcement Learning0
Reversible Upper Confidence Bound Algorithm to Generate Diverse Optimized Candidates0
Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation0
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates0
Show:102550
← PrevPage 260 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified