SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 37513775 of 15113 papers

TitleStatusHype
Client Selection for Federated Policy Optimization with Environment HeterogeneityCode0
Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement LearningCode0
Gap-Dependent Unsupervised Exploration for Reinforcement LearningCode0
GAN Q-learningCode0
GAC: A Deep Reinforcement Learning Model Toward User Incentivization in Unknown Social NetworksCode0
Classification with Costly Features using Deep Reinforcement LearningCode0
Classification with Costly Features as a Sequential Decision-Making ProblemCode0
Fuzzy Logic Guided Reward Function Variation: An Oracle for Testing Reinforcement Learning ProgramsCode0
Gaussian Processes for Data-Efficient Learning in Robotics and ControlCode0
Functional Acceleration for Policy Mirror DescentCode0
Fully Parameterized Quantile Function for Distributional Reinforcement LearningCode0
Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to ActionsCode0
Fully Convolutional Network with Multi-Step Reinforcement Learning for Image ProcessingCode0
Deep Reinforcement Learning from Hierarchical Preference DesignCode0
From Two-Dimensional to Three-Dimensional Environment with Q-Learning: Modeling Autonomous Navigation with Reinforcement Learning and no LibrariesCode0
Deep reinforcement learning from human preferencesCode0
Adaptive Power System Emergency Control using Deep Reinforcement LearningCode0
Action Robust Reinforcement Learning and Applications in Continuous ControlCode0
From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMsCode0
Hierarchical Potential-based Reward Shaping from Task SpecificationsCode0
CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic ScenarioCode0
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC GuaranteesCode0
From Gameplay to Symbolic Reasoning: Learning SAT Solver Heuristics in the Style of Alpha(Go) ZeroCode0
Circular Microalgae-Based Carbon Control for Net ZeroCode0
A policy gradient approach for Finite Horizon Constrained Markov Decision ProcessesCode0
Show:102550
← PrevPage 151 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified