SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1055110600 of 15113 papers

TitleStatusHype
Model-Based Reinforcement Learning with Value-Targeted Regression0
Robust Reinforcement Learning with Wasserstein Constraint0
Reinforcement learning and Bayesian data assimilation for model-informed precision dosing in oncology0
Temporal-Differential Learning in Continuous Environments0
PlanGAN: Model-based Planning With Sparse Rewards and Multiple GoalsCode1
A novel approach for multi-agent cooperative pursuit to capture grouped evaders0
Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement LearningCode1
Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulasCode1
Acme: A Research Framework for Distributed Reinforcement LearningCode1
Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog0
MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement LearningCode0
Reinforcement LearningCode0
Sim2Real for Peg-Hole Insertion with Eye-in-Hand CameraCode1
AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments0
Deep Reinforcement learning for real autonomous mobile robot navigation in indoor environmentsCode1
Intelligent Residential Energy Management System using Deep Reinforcement Learning0
Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning0
Predicting Goal-directed Human Attention Using Inverse Reinforcement LearningCode1
Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement LearningCode0
MOPO: Model-based Offline Policy OptimizationCode1
Anomaly Detection Under Controlled Sensing Using Actor-Critic Reinforcement Learning0
ALBA : Reinforcement Learning for Video Object SegmentationCode0
Towards intervention-centric causal reasoning in learning agents0
Modeling Penetration Testing with Reinforcement Learning Using Capture-the-Flag Challenges: Trade-offs between Model-free Learning and A Priori KnowledgeCode1
Time-Variant Variational Transfer for Value Functions0
Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games0
A reinforcement learning approach to rare trajectory samplingCode0
Active Measure Reinforcement Learning for Observation Cost Minimization0
Integrating LEO Satellite and UAV Relaying via Reinforcement Learning for Non-Terrestrial Networks0
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model0
Deep Reinforcement Learning Based Power Allocation for D2D Network0
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPOCode1
Meta-Reinforcement Learning for Trajectory Design in Wireless UAV Networks0
Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications0
Reinforcement Learning with Iterative Reasoning for Merging in Dense Traffic0
Policy Entropy for Out-of-Distribution Classification0
Gradient Monitored Reinforcement Learning0
Deep Learning Models for Automatic Summarization0
Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning0
Generator and Critic: A Deep Reinforcement Learning Approach for Slate Re-ranking in E-commerce0
GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning0
Automatic Discovery of Interpretable Planning StrategiesCode0
Model-free Reinforcement Learning for Stochastic Stackelberg Security Games0
Adaptive Reinforcement Learning through Evolving Self-Modifying Neural Networks0
Towards Automated Safety Coverage and Testing for Autonomous Vehicles with Reinforcement Learning0
Reinforcement learning with human advice: a survey0
Evaluating Generalisation in General Video Game Playing0
Q-NAV: NAV Setting Method based on Reinforcement Learning in Underwater Wireless Networks0
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension0
Novel Policy Seeking with Constrained OptimizationCode0
Show:102550
← PrevPage 212 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified