SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1435114400 of 15113 papers

TitleStatusHype
A Tree Search Algorithm for Sequence LabelingCode0
Free-Lunch Saliency via Attention in Atari AgentsCode0
Actor-Mimic: Deep Multitask and Transfer Reinforcement LearningCode0
Learning Visual Servoing with Deep Features and Fitted Q-IterationCode0
Frequentist Regret Bounds for Randomized Least-Squares Value IterationCode0
Meta-learning Convolutional Neural Architectures for Multi-target Concrete Defect Classification with the COncrete DEfect BRidge IMage DatasetCode0
Adaptive teachers for amortized samplersCode0
Hyp-RL : Hyperparameter Optimization by Reinforcement LearningCode0
DynamicLight: Two-Stage Dynamic Traffic Signal TimingCode0
From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence PredictionCode0
Hysteresis-Based RL: Robustifying Reinforcement Learning-based Control Policies via Hybrid ControlCode0
Convergent and Efficient Deep Q Network AlgorithmCode0
Dynamic Measurement Scheduling for Event Forecasting using Deep RLCode0
ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity?Code0
Learning a model is paramount for sample efficiency in reinforcement learning control of PDEsCode0
From Gameplay to Symbolic Reasoning: Learning SAT Solver Heuristics in the Style of Alpha(Go) ZeroCode0
Dynamic Network Reconfiguration for Entropy Maximization using Deep Reinforcement LearningCode0
DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment DesignCode0
A Tour of Reinforcement Learning: The View from Continuous ControlCode0
Identifiability and generalizability from multiple experts in Inverse Reinforcement LearningCode0
Dynamic Observation Policies in Observation Cost-Sensitive Reinforcement LearningCode0
From Images to Connections: Can DQN with GNNs learn the Strategic Game of Hex?Code0
Identifiability and Generalizability in Constrained Inverse Reinforcement LearningCode0
Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary ObjectivesCode0
A Threshold-based Scheme for Reinforcement Learning in Neural NetworksCode0
From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal LikelihoodCode0
Active Policy Improvement from Multiple Black-box OraclesCode0
An Investigation of Offline Reinforcement Learning in Factorisable Action SpacesCode0
Learning to Navigate in Cities Without a MapCode0
From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMsCode0
Deep Exploration via Bootstrapped DQNCode0
Identifying optimal cycles in quantum thermal machines with reinforcement-learningCode0
An investigation of model-free planningCode0
Characterizing Attacks on Deep Reinforcement LearningCode0
Deep PQR: Solving Inverse Reinforcement Learning using Anchor ActionsCode0
Challenging common bolus advisor for self-monitoring type-I diabetes patients using Reinforcement LearningCode0
Classifying Ambiguous Identities in Hidden-Role Stochastic Games with Multi-Agent Reinforcement LearningCode0
IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022Code0
A Temporal Difference Method for Stochastic Continuous DynamicsCode0
IGN : Implicit Generative NetworksCode0
Hierarchical Potential-based Reward Shaping from Task SpecificationsCode0
Dynamics-aware EmbeddingsCode0
Learning to Navigate in Complex EnvironmentsCode0
A Systematization of the Wagner Framework: Graph Theory Conjectures and Reinforcement LearningCode0
Bayesian Reinforcement Learning via Deep, Sparse SamplingCode0
From Two-Dimensional to Three-Dimensional Environment with Q-Learning: Modeling Autonomous Navigation with Reinforcement Learning and no LibrariesCode0
IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation TasksCode0
Dynamic Update-to-Data Ratio: Minimizing World Model OverfittingCode0
Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement LearningCode0
Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural NetworksCode0
Show:102550
← PrevPage 288 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified