SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1505115100 of 15113 papers

TitleStatusHype
Back to Basics: Deep Reinforcement Learning in Traffic Signal ControlCode0
Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control PriorsCode0
Learning Heuristics over Large Graphs via Deep Reinforcement LearningCode0
HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI GymCode0
Contextual Imagined Goals for Self-Supervised Robotic LearningCode0
Health-Informed Policy Gradients for Multi-Agent Reinforcement LearningCode0
Health Text Simplification: An Annotated Corpus for Digestive Cancer Education and Novel Strategies for Reinforcement LearningCode0
Context Meta-Reinforcement Learning via NeuromodulationCode0
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing AtariCode0
Infinite Time Horizon Safety of Bayesian Neural NetworksCode0
Self-supervised network distillation: an effective approach to exploration in sparse reward environmentsCode0
An Actor-Critic Algorithm for Sequence PredictionCode0
Exploration Conscious Reinforcement Learning RevisitedCode0
Influence-aware Memory Architectures for Deep Reinforcement LearningCode0
Influence-Based Multi-Agent ExplorationCode0
Adversarial Environment Generation for Learning to Navigate the WebCode0
Arachnophobia Exposure Therapy using Experience-driven Procedural Content Generation via Reinforcement Learning (EDPCGRL)Code0
Exploration in Action SpaceCode0
Context-Aware Visual Policy Network for Sequence-Level Image CaptioningCode0
Influencing Reinforcement Learning through Natural Language GuidanceCode0
Semantic RL with Action Grammars: Data-Efficient Learning of Hierarchical Task AbstractionsCode0
Learning how to Active Learn: A Deep Reinforcement Learning ApproachCode0
Backprop-Q: Generalized Backpropagation for Stochastic Computation GraphsCode0
A Quadratic Actor Network for Model-Free Reinforcement LearningCode0
Information-Directed Exploration for Deep Reinforcement LearningCode0
APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement LearningCode0
Information-Driven Adaptive Sensing Based on Deep Reinforcement LearningCode0
Learning How to Active Learn by DreamingCode0
Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial PuzzlesCode0
A Multilevel Reinforcement Learning Framework for PDE-based ControlCode0
A Multi-Document Coverage Reward for RELAXed Multi-Document SummarizationCode0
Constructing Non-Markovian Decision Process via History AggregatorCode0
Exploration via Flow-Based Intrinsic RewardsCode0
Exploration via Hindsight Goal GenerationCode0
Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster LearningCode0
Modeling Human Exploration Through Resource-Rational Reinforcement LearningCode0
Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement LearningCode0
Backpropagation through the Void: Optimizing control variates for black-box gradient estimationCode0
Learning How to Actively Learn: A Deep Imitation Learning ApproachCode0
Exploratory Combinatorial Optimization with Reinforcement LearningCode0
Accurate Uncertainties for Deep Learning Using Calibrated RegressionCode0
Exploratory Gradient Boosting for Reinforcement Learning in Complex DomainsCode0
Backplay: "Man muss immer umkehren"Code0
A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided MarketsCode0
Exploratory State Representation LearningCode0
Constrained Reinforcement Learning for Safe Heat Pump ControlCode0
Explore and Exploit with Heterotic Line Bundle ModelsCode0
LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option FrameworkCode0
B2RL: An open-source Dataset for Building Batch Reinforcement LearningCode0
Information-Theoretic State Variable Selection for Reinforcement LearningCode0
Show:102550
← PrevPage 302 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified