SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1270112750 of 15113 papers

TitleStatusHype
Strongly-polynomial time and validation analysis of policy gradient methods0
Structural Credit Assignment in Neural Networks using Reinforcement Learning0
Structural Credit Assignment with Coordinated Exploration0
Structural Return Maximization for Reinforcement Learning0
Structural Similarity for Improved Transfer in Reinforcement Learning0
Structure-aware reinforcement learning for node-overload protection in mobile edge computing0
Structure-Aware Transformer Policy for Inhomogeneous Multi-Task Reinforcement Learning0
Structured Dialogue Policy with Graph Neural Networks0
Structured Graph Network for Constrained Robot Crowd Navigation with Low Fidelity Simulation0
Structured Reinforcement Learning for Delay-Optimal Data Transmission in Dense mmWave Networks0
Structured World Belief for Reinforcement Learning in POMDP0
Structure-Enhanced Deep Reinforcement Learning for Optimal Transmission Scheduling0
Structure in Deep Reinforcement Learning: A Survey and Open Problems0
Structure Learning in Human Sequential Decision-Making0
Structure Learning in Motor Control:A Deep Reinforcement Learning Model0
Student/Teacher Advising through Reward Augmentation0
Student-Teacher Curriculum Learning via Reinforcement Learning: Predicting Hospital Inpatient Admission Location0
Stylistic Dialogue Generation via Information-Guided Reinforcement Learning Strategy0
Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning0
Subgoal-based Reward Shaping to Improve Efficiency in Reinforcement Learning0
Subgoal Discovery Using a Free Energy Paradigm and State Aggregations0
Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning0
Relative Entropy Regularized Policy IterationCode0
Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement LearningCode0
Towards More Sample Efficiency in Reinforcement Learning with Data AugmentationCode0
Sequential memory improves sample and memory efficiency in Episodic ControlCode0
Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN TargetCode0
Proper Value EquivalenceCode0
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement LearningCode0
Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term RetentionCode0
Meta-Reinforcement Learning via Buffering Graph Signatures for Live Video Streaming EventsCode0
On the Effectiveness of Offline RL for Dialogue Response GenerationCode0
Relational Graph Learning for Crowd NavigationCode0
Relational Deep Reinforcement LearningCode0
ReLAX: Reinforcement Learning Agent eXplainer for Arbitrary Predictive ModelsCode0
Task-Oriented Language Grounding for Language Input with Multiple Sub-Goals of Non-Linear OrderCode0
Task-Oriented Query Reformulation with Reinforcement LearningCode0
Task Phasing: Automated Curriculum Learning from DemonstrationsCode0
UNSAT Solver Synthesis via Monte Carlo Forest SearchCode0
Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence ArchitecturesCode0
Remember and Forget for Experience ReplayCode0
Value Iteration for Learning Concurrently Executable Robotic Control TasksCode0
Monolithic vs. hybrid controller for multi-objective Sim-to-Real learningCode0
Value Iteration NetworksCode0
Renaissance Robot: Optimal Transport Policy Fusion for Learning Diverse SkillsCode0
Propagating Uncertainty in Reinforcement Learning via Wasserstein BarycentersCode0
Model-based Offline Policy Optimization with Adversarial NetworkCode0
Setting up a Reinforcement Learning Task with a Real-World RobotCode0
Monitored Markov Decision ProcessesCode0
TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed DatasetsCode0
Show:102550
← PrevPage 255 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified