SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 19011925 of 15113 papers

TitleStatusHype
Robot Perception enables Complex Navigation Behavior via Self-Supervised LearningCode1
Model-based Adversarial Meta-Reinforcement LearningCode1
AWAC: Accelerating Online Reinforcement Learning with Offline DatasetsCode1
Agent Modelling under Partial Observability for Deep Reinforcement LearningCode1
Analytic Manifold Learning: Unifying and Evaluating Representations for Continuous ControlCode1
Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large GamesCode1
MetaCURE: Meta Reinforcement Learning with Empowerment-Driven ExplorationCode1
Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and PlanningCode1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative TasksCode1
Shared Experience Actor-Critic for Multi-Agent Reinforcement LearningCode1
TorsionNet: A Reinforcement Learning Approach to Sequential Conformer SearchCode1
SAMBA: Safe Model-Based & Active Reinforcement LearningCode1
Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue SystemCode1
Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic ReasoningCode1
Robust Spammer Detection by Nash Reinforcement LearningCode1
Learning to Incentivize Other Learning AgentsCode1
What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical StudyCode1
Constrained episodic reinforcement learning in concave-convex and knapsack settingsCode1
Conservative Q-Learning for Offline Reinforcement LearningCode1
Learning to Play No-Press Diplomacy with Best Response Policy IterationCode1
Reinforcement Learning Under Moral UncertaintyCode1
Randomized Entity-wise Factorization for Multi-Agent Reinforcement LearningCode1
Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply ChainsCode1
Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample ComplexityCode1
Deployment-Efficient Reinforcement Learning via Model-Based Offline OptimizationCode1
Show:102550
← PrevPage 77 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified