SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 45514575 of 15113 papers

TitleStatusHype
Post Reinforcement Learning InferenceCode0
Posterior-regularized REINFORCE for Instance Selection in Distant SupervisionCode0
Learning to Score Behaviors for Guided Policy OptimizationCode0
Neural Modular Control for Embodied Question AnsweringCode0
Self-Guided Evolution Strategies with Historical Estimated GradientsCode0
WaveCorr: Correlation-savvy Deep Reinforcement Learning for Portfolio ManagementCode0
Posterior Sampling for Reinforcement Learning Without EpisodesCode0
MASAI: Multi-agent Summative Assessment Improvement for Unsupervised Environment DesignCode0
Self-Imitation Learning for Robot Tasks with Sparse and Delayed RewardsCode0
MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective IntelligenceCode0
Post-processing Networks: Method for Optimizing Pipeline Task-oriented Dialogue Systems using Reinforcement LearningCode0
Weakly Supervised Reinforcement Learning for Autonomous Highway Driving via Virtual Safety CagesCode0
Weakly Supervised Scene Text Detection using Deep Reinforcement LearningCode0
On the Generalization of Representations in Reinforcement LearningCode0
Self-Learning Exploration and Mapping for Mobile Robots via Deep Reinforcement LearningCode0
Weak Supervision for Fake News Detection via Reinforcement LearningCode0
Learning to Play Text-based Adventure Games with Maximum Entropy Reinforcement LearningCode0
Multiagent Inverse Reinforcement Learning via Theory of Mind ReasoningCode0
MICo: Improved representations via sampling-based state similarity for Markov decision processesCode0
MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman OperatorCode0
Self-Paced Context Evaluation for Contextual Reinforcement LearningCode0
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural NetworksCode0
Welfare and Fairness in Multi-objective Reinforcement LearningCode0
Learning Progress Driven Multi-Agent CurriculumCode0
Reinforcement Learning for Market Making in a Multi-agent Dealer MarketCode0
Show:102550
← PrevPage 183 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified