SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 876900 of 15113 papers

TitleStatusHype
Consistency Models as a Rich and Efficient Policy Class for Reinforcement LearningCode1
Reliable Conditioning of Behavioral Cloning for Offline Reinforcement LearningCode1
Zero-Shot Reinforcement Learning from Low Quality DataCode1
Simplified Action Decoder for Deep Multi-Agent Reinforcement LearningCode1
Constrained episodic reinforcement learning in concave-convex and knapsack settingsCode1
Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk DecodingCode1
FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement LearningCode1
DisCor: Corrective Feedback in Reinforcement Learning via Distribution CorrectionCode1
Diminishing Return of Value Expansion Methods in Model-Based Reinforcement LearningCode1
GAEA: Graph Augmentation for Equitable Access via Reinforcement LearningCode1
Aligning Language Models with Human Preferences via a Bayesian ApproachCode1
A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise DatasetsCode1
Constrained Variational Policy Optimization for Safe Reinforcement LearningCode1
Constructions in combinatorics via neural networksCode1
Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive RecommendationCode1
Contention Window Optimization in IEEE 802.11ax Networks with Deep Reinforcement LearningCode1
ALLSTEPS: Curriculum-driven Learning of Stepping Stone SkillsCode1
All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RLCode1
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement LearningCode1
Continual Model-Based Reinforcement Learning with HypernetworksCode1
Continual Backprop: Stochastic Gradient Descent with Persistent RandomnessCode1
Continual Learning with Gated Incremental Memories for sequential data processingCode1
Converting Biomechanical Models from OpenSim to MuJoCoCode1
Direct Behavior Specification via Constrained Reinforcement LearningCode1
DISCOVER: Deep identification of symbolically concise open-form PDEs via enhanced reinforcement-learningCode1
Show:102550
← PrevPage 36 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified