SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 80518100 of 15113 papers

TitleStatusHype
Optimizing Industrial HVAC Systems with Hierarchical Reinforcement Learning0
Optimizing Information Bottleneck in Reinforcement Learning: A Stein Variational Approach0
Optimizing Job Allocation using Reinforcement Learning with Graph Neural Networks0
Optimizing Load Scheduling in Power Grids Using Reinforcement Learning and Markov Decision Processes0
Optimizing Low-Speed Autonomous Driving: A Reinforcement Learning Approach to Route Stability and Maximum Speed0
Optimizing Market Making using Multi-Agent Reinforcement Learning0
Optimizing Medical Treatment for Sepsis in Intensive Care: from Reinforcement Learning to Pre-Trial Evaluation0
Optimizing Memory Mapping Using Deep Reinforcement Learning0
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning0
Optimizing Multiagent Cooperation via Policy Evolution and Shared Experiences0
Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree0
Optimizing Nitrogen Management with Deep Reinforcement Learning and Crop Simulations0
Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning0
Optimizing Portfolio with Two-Sided Transactions and Lending: A Reinforcement Learning Framework0
Optimizing Prompt Strategies for SAM: Advancing lesion Segmentation Across Diverse Medical Imaging Modalities0
Optimizing Quantum Error Correction Codes with Reinforcement Learning0
Optimizing Query Evaluations using Reinforcement Learning for Web Search0
Optimizing Routerless Network-on-Chip Designs: An Innovative Learning-Based Framework0
Optimizing Sensor Redundancy in Sequential Decision-Making Problems0
Optimizing Sponsored Search Ranking Strategy by Deep Reinforcement Learning0
Optimizing Taxi Carpool Policies via Reinforcement Learning and Spatio-Temporal Mining0
Optimizing Tensor Network Contraction Using Reinforcement Learning0
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning0
Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports0
Optimizing the Long-Term Average Reward for Continuing MDPs: A Technical Report0
Optimizing the Long-Term Behaviour of Deep Reinforcement Learning for Pushing and Grasping0
Optimizing Traffic Lights with Multi-agent Deep Reinforcement Learning and V2X communication0
Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning0
Optimizing Wireless Discontinuous Reception via MAC Signaling Learning0
Option Compatible Reward Inverse Reinforcement Learning0
Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering0
Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning0
Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning0
Option Hedging with Risk Averse Reinforcement Learning0
Options as responses: Grounding behavioural hierarchies in multi-agent RL0
OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning0
Options Discovery with Budgeted Reinforcement Learning0
OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World0
Oracle-Efficient Reinforcement Learning for Max Value Ensembles0
Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path0
Oracle Inequalities for Model Selection in Offline Reinforcement Learning0
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning0
OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics0
Ordering-Based Causal Discovery with Reinforcement Learning0
Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback0
Organ localisation using supervised and semi supervised approaches combining reinforcement learning with imitation learning0
Orthogonal Estimation of Wasserstein Distances0
Orthogonal Policy Gradient and Autonomous Driving Application0
OSS Mentor A framework for improving developers contributions via deep reinforcement learning0
OTC: Optimal Tool Calls via Reinforcement Learning0
Show:102550
← PrevPage 162 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified