SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1420114250 of 15113 papers

TitleStatusHype
AdCraft: An Advanced Reinforcement Learning Benchmark Environment for Search Engine Marketing OptimizationCode0
Deep Reinforcement Learning for Cybersecurity Assessment of Wind Integrated Power SystemsCode0
Cold-Start Reinforcement Learning with Softmax Policy GradientCode0
CODEX: A Cluster-Based Method for Explainable Reinforcement LearningCode0
COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven ExplorationCode0
AdaStop: adaptive statistical testing for sound comparisons of Deep RL agentsCode0
AutoFS: Automated Feature Selection via Diversity-aware Interactive Reinforcement LearningCode0
An Object-Oriented Representation for Efficient Reinforcement LearningCode0
AutoBS: Autonomous Base Station Deployment with Reinforcement Learning and Digital Network TwinsCode0
CoaCor: Code Annotation for Code Retrieval with Reinforcement LearningCode0
Weak Human Preference Supervision For Deep Reinforcement LearningCode0
Learning to Fly via Deep Model-Based Reinforcement LearningCode0
Learning on a Budget via Teacher ImitationCode0
Learning To Follow Directions in Street ViewCode0
A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning AlgorithmsCode0
Learning on One Mode: Addressing Multi-Modality in Offline Reinforcement LearningCode0
DOM-Q-NET: Grounded RL on Structured LanguageCode0
Learning to Follow Instructions in Text-Based GamesCode0
A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement LearningCode0
Deep Reinforcement Learning for Control of Probabilistic Boolean NetworksCode0
Deep Reinforcement Learning for Chinese Zero pronoun ResolutionCode0
Learning to Generalize for Sequential Decision MakingCode0
Coach-assisted Multi-Agent Reinforcement Learning Framework for Unexpected Crashed AgentsCode0
Hybrid Actor-Critic Reinforcement Learning in Parameterized Action SpaceCode0
Jointly Learning Environments and Control Policies with Projected Stochastic Gradient AscentCode0
Accelerate Reinforcement Learning with PID Controllers in the Pendulum SimulationsCode0
Deep Reinforcement Learning for Autonomous DrivingCode0
CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement LearningCode0
Flappy Hummingbird: An Open Source Dynamic Simulation of Flapping Wing Robots and AnimalsCode0
DoorGym: A Scalable Door Opening Environment And Baseline AgentCode0
Deep Reinforcement Learning for Conversational AICode0
Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learningCode0
DORA The Explorer: Directed Outreaching Reinforcement Action-SelectionCode0
DORA: Toward Policy Optimization for Task-oriented Dialogue System with Efficient ContextCode0
Adapting to Reward Progressivity via Spectral Reinforcement LearningCode0
Learning Transferable Reward for Query Object Localization with Policy AdaptationCode0
Dota 2 with Large Scale Deep Reinforcement LearningCode0
Fleet Control using Coregionalized Gaussian Process Policy IterationCode0
Leveraging Fully Observable Policies for Learning under Partial ObservabilityCode0
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based ImaginationCode0
Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement LearningCode0
CLUTR: Curriculum Learning via Unsupervised Task Representation LearningCode0
Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement LearningCode0
Learning to solve the credit assignment problemCode0
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision ProcessesCode0
Flexible Option LearningCode0
Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement LearningCode0
Doubly Inhomogeneous Reinforcement LearningCode0
Deep Reinforcement Learning Based Parameter Control in Differential EvolutionCode0
Cloud Database Tuning with Reinforcement LearningCode0
Show:102550
← PrevPage 285 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified