SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1395114000 of 15113 papers

TitleStatusHype
Learning Local Search Heuristics for Boolean SatisfiabilityCode0
Faster Reinforcement Learning Using Active SimulatorsCode0
Deep Reinforcement Learning that MattersCode0
A Lyapunov-based Approach to Safe Reinforcement LearningCode0
Alpha-Mini: Minichess Agent with Deep Reinforcement LearningCode0
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC GuaranteesCode0
A policy gradient approach for Finite Horizon Constrained Markov Decision ProcessesCode0
Learning to Deceive Knowledge Graph Augmented Models via Targeted PerturbationCode0
Directly Forecasting Belief for Reinforcement Learning with DelaysCode0
Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL AgentCode0
ALPaCA vs. GP-based Prior Learning: A Comparison between two Bayesian Meta-Learning AlgorithmsCode0
A Deep Reinforcement Learning Approach to Audio-Based Navigation in a Multi-Speaker EnvironmentCode0
Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a SupercomputerCode0
Direct Random Search for Fine Tuning of Deep Reinforcement Learning PoliciesCode0
Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement LearningCode0
Fast Rates for Maximum Entropy ExplorationCode0
Langevin DQNCode0
Deep Reinforcement Learning of Region Proposal Networks for Object DetectionCode0
Autonomous Navigation via Deep Reinforcement Learning for Resource Constraint Edge Nodes using Transfer LearningCode0
On the Correctness and Sample Complexity of Inverse Reinforcement LearningCode0
High-Throughput Distributed Reinforcement Learning via Adaptive Policy SynchronizationCode0
Discount Factor as a Regularizer in Reinforcement LearningCode0
Autonomous Management of Energy-Harvesting IoT Nodes Using Deep Reinforcement LearningCode0
Highway Graph to Accelerate Reinforcement LearningCode0
Discourse Marker Augmented Network with Reinforcement Learning for Natural Language InferenceCode0
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement LearningCode0
Deep Reinforcement Learning of Marked Temporal Point ProcessesCode0
Deep Reinforcement Learning meets Graph Neural Networks: exploring a routing optimization use caseCode0
A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement LearningCode0
Learning Low-Frequency Motion Control for Robust and Dynamic Robot LocomotionCode0
A Deep Reinforcement Learning Approach for Global RoutingCode0
Language as an Abstraction for Hierarchical Deep Reinforcement LearningCode0
Autonomous Braking System via Deep Reinforcement LearningCode0
Conditional Computation in Neural Networks for faster modelsCode0
Deep reinforcement learning in World-Earth system models to discover sustainable management strategiesCode0
Hindsight Credit AssignmentCode0
Discovering General-Purpose Active Learning StrategiesCode0
Deep Reinforcement Learning in Quantitative Algorithmic Trading: A ReviewCode0
Deep Reinforcement Learning in Large Discrete Action SpacesCode0
APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy OptimizationCode0
Learning to Describe for Predicting Zero-shot Drug-Drug InteractionsCode0
Deep Reinforcement Learning for Synthesizing Functions in Higher-Order LogicCode0
Faults in Deep Reinforcement Learning Programs: A Taxonomy and A Detection ApproachCode0
Automating Reinforcement Learning with Example-based ResetsCode0
Deep Reinforcement Learning in Continuous Action Spaces: a Case Study in the Game of Simulated CurlingCode0
Concurrent Meta Reinforcement LearningCode0
FCMNet: Full Communication Memory Net for Team-Level Cooperation in Multi-Agent SystemsCode0
Hindsight Foresight Relabeling for Meta-Reinforcement LearningCode0
A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning AgentsCode0
Concurrent Credit Assignment for Data-efficient Reinforcement LearningCode0
Show:102550
← PrevPage 280 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified