SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1490114950 of 15113 papers

TitleStatusHype
Grounding Language for Transfer in Deep Reinforcement LearningCode0
Combining imitation and deep reinforcement learning to accomplish human-level performance on a virtual foraging taskCode0
Learning by Competition of Self-Interested Reinforcement Learning AgentsCode0
Bayesian Optimization for Iterative LearningCode0
Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card GameCode0
Bayesian Nonparametrics for Offline Skill DiscoveryCode0
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language NavigationCode0
Learning from Multiple Independent Advisors in Multi-agent Reinforcement LearningCode0
Group-driven Reinforcement Learning for Personalized mHealth InterventionCode0
Evolutionary learning of interpretable decision treesCode0
Group Equivariant Deep Reinforcement LearningCode0
Controlling epidemics through optimal allocation of test kits and vaccine doses across networksCode0
Learning the Reward Function for a Misspecified ModelCode0
Growing Action SpacesCode0
Controllable Neural Story Plot Generation via Reward ShapingCode0
Incentivizing Exploration In Reinforcement Learning With Deep Predictive ModelsCode0
A reinforcement learning approach to rare trajectory samplingCode0
Incentivizing Reasoning from Weak SupervisionCode0
Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environmentsCode0
Bayesian Inverse Reinforcement Learning for Collective Animal MovementCode0
gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning ApproachCode0
A Reinforcement Learning Approach to Domain-Knowledge Inclusion Using Grammar Guided Symbolic RegressionCode0
Evolution-Guided Policy Gradient in Reinforcement LearningCode0
A Reinforcement Learning Approach to Interactive-Predictive Neural Machine TranslationCode0
Adaptive Discretization for Model-Based Reinforcement LearningCode0
Control Frequency Adaptation via Action Persistence in Batch Reinforcement LearningCode0
A Reinforcement Learning Approach for Performance-aware Reduction in Power Consumption of Data Center Compute NodesCode0
Evolved Policy GradientsCode0
Contrastive Explanations for Reinforcement Learning via Embedded Self PredictionsCode0
Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement LearningCode0
Learning from Sparse Offline Datasets via Conservative Density EstimationCode0
Guide Actor-Critic for Continuous ControlCode0
Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP ProblemsCode0
Jointly Learning to Construct and Control Agents using Deep Reinforcement LearningCode0
Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement LearningCode0
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement LearningCode0
Guided Cooperation in Hierarchical Reinforcement Learning via Model-based RolloutCode0
Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization PerspectiveCode0
Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement LearningCode0
Guided Deep Reinforcement Learning for Swarm SystemsCode0
Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented DialogCode0
Continuous Value Iteration (CVI) Reinforcement Learning and Imaginary Experience Replay (IER) for learning multi-goal, continuous action and state space controllersCode0
Guided Dialog Policy Learning without Adversarial Learning in the LoopCode0
Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUpCode0
EX2: Exploration with Exemplar Models for Deep Reinforcement LearningCode0
Exact Asymptotics for Linear Quadratic Adaptive ControlCode0
Exact-K Recommendation via Maximal Clique OptimizationCode0
Guided Dialogue Policy Learning without Adversarial Learning in the LoopCode0
LIFT: Reinforcement Learning in Computer Systems by Learning From DemonstrationsCode0
Examining Policy Entropy of Reinforcement Learning Agents for Personalization TasksCode0
Show:102550
← PrevPage 299 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified