SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1400114050 of 15113 papers

TitleStatusHype
Concrete DropoutCode0
Discrete Action On-Policy Learning with Action-Value CriticCode0
Discrete and Continuous Action Representation for Practical RL in Video GamesCode0
Deep reinforcement learning from human preferencesCode0
Hindsight Learning for MDPs with Exogenous InputsCode0
Hindsight policy gradientsCode0
Learning to Perform Local Rewriting for Combinatorial OptimizationCode0
Feature-Attending Recurrent Modules for Generalization in Reinforcement LearningCode0
Action Advising with Advice Imitation in Deep Reinforcement LearningCode0
Logic-based Reward Shaping for Multi-Agent Reinforcement LearningCode0
Discrete State-Action Abstraction via the Successor RepresentationCode0
Hindsight Trust Region Policy OptimizationCode0
Discrete-to-Deep Supervised Policy LearningCode0
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement LearningCode0
Deep Reinforcement Learning from Hierarchical Preference DesignCode0
H_ Model-free Reinforcement Learning with Robust Stability GuaranteeCode0
Deep Reinforcement Learning framework for Autonomous DrivingCode0
Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy MethodsCode0
Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement LearningCode0
Hint assisted reinforcement learning: an application in radio astronomyCode0
Disentangled (Un)Controllable FeaturesCode0
Learning robust control for LQR systems with multiplicative noise via policy gradientCode0
Disentangling Abstraction from Statistical Pattern Matching in Human and Machine LearningCode0
Automatic Goal Generation for Reinforcement Learning AgentsCode0
Federated Control with Hierarchical Multi-Agent Deep Reinforcement LearningCode0
ComSD: Balancing Behavioral Quality and Diversity in Unsupervised Skill DiscoveryCode0
APEX: Empowering LLMs with Physics-Based Task Planning for Real-time InsightCode0
Computing the Feedback Capacity of Finite State Channels using Reinforcement LearningCode0
Automatic Discovery of Interpretable Planning StrategiesCode0
Aligning an optical interferometer with beam divergence control and continuous action spaceCode0
Language Model Alignment with Elastic ResetCode0
A Lightweight Calibrated Simulation Enabling Efficient Offline Learning for Optimal Control of Real BuildingsCode0
Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical GuaranteesCode0
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness RewardCode0
Automatically Exposing Problems with Neural Dialog ModelsCode0
Dissecting Long Reasoning Models: An Empirical StudyCode0
HOList: An Environment for Machine Learning of Higher-Order Theorem ProvingCode0
A learning gap between neuroscience and reinforcement learningCode0
Deep Reinforcement Learning for Traffic Light Control in Vehicular NetworksCode0
Distance Weighted Supervised Learning for Offline Interaction DataCode0
Distantly Supervised NER with Partial Annotation Learning and Reinforcement LearningCode0
APES: a Python toolbox for simulating reinforcement learning environmentsCode0
Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (Technical Report)Code0
Homogenization of Multi-agent Learning Dynamics in Finite-state Markov GamesCode0
Intelligent Traffic Light via Policy-based Deep Reinforcement LearningCode0
Automated quantum programming via reinforcement learning for combinatorial optimizationCode0
Language Understanding for Text-based Games Using Deep Reinforcement LearningCode0
Intelligent Trainer for Model-Based Reinforcement LearningCode0
Deep reinforcement learning for time series: playing idealized trading gamesCode0
Automated Proof of Polynomial Inequalities via Reinforcement LearningCode0
Show:102550
← PrevPage 281 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified