SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 36013625 of 15113 papers

TitleStatusHype
Genes in Intelligent AgentsCode0
Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement LearningCode0
ALPaCA vs. GP-based Prior Learning: A Comparison between two Bayesian Meta-Learning AlgorithmsCode0
Deep Reinforcement Learning for De-Novo Drug DesignCode0
Generative Planning for Temporally Coordinated Exploration in Reinforcement LearningCode0
Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement LearningCode0
Generative Adversarial User Model for Reinforcement Learning Based Recommendation SystemCode0
Generative Question Refinement with Deep Reinforcement Learning in Retrieval-based QA SystemCode0
GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning AlgorithmsCode0
Collaborative Evolutionary Reinforcement LearningCode0
Active Collection of Well-Being and Health Data in Mobile DevicesCode0
Generating Multi-type Temporal Sequences to Mitigate Class-imbalanced ProblemCode0
On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement LearningCode0
Provably Correct Optimization and Exploration with Non-linear PoliciesCode0
Collaborative Deep Reinforcement LearningCode0
Generating Classical Chinese Poems from Vernacular ChineseCode0
A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware PerspectiveCode0
Generative Adversarial Network for Abstractive Text SummarizationCode0
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM AlignmentCode0
Cold-Start Reinforcement Learning with Softmax Policy GradientCode0
Generalized Speedy Q-learningCode0
Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement LearningCode0
General Policy Evaluation and Improvement by Learning to Identify Few But Crucial StatesCode0
Generalized Adaptive Transfer Network: Enhancing Transfer Learning in Reinforcement Learning Across DomainsCode0
Generalization Tower Network: A Novel Deep Neural Network Architecture for Multi-Task LearningCode0
Show:102550
← PrevPage 145 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified