SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 86768700 of 15113 papers

TitleStatusHype
Optimizing Medical Treatment for Sepsis in Intensive Care: from Reinforcement Learning to Pre-Trial Evaluation0
Optimizing Memory Mapping Using Deep Reinforcement Learning0
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning0
Optimizing Multiagent Cooperation via Policy Evolution and Shared Experiences0
Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree0
Optimizing Nitrogen Management with Deep Reinforcement Learning and Crop Simulations0
Optimizing Novelty of Top-k Recommendations using Large Language Models and Reinforcement Learning0
Optimizing Portfolio with Two-Sided Transactions and Lending: A Reinforcement Learning Framework0
Optimizing Prompt Strategies for SAM: Advancing lesion Segmentation Across Diverse Medical Imaging Modalities0
Optimizing Quantum Error Correction Codes with Reinforcement Learning0
Optimizing Query Evaluations using Reinforcement Learning for Web Search0
Optimizing Routerless Network-on-Chip Designs: An Innovative Learning-Based Framework0
Optimizing Sensor Redundancy in Sequential Decision-Making Problems0
Optimizing Sponsored Search Ranking Strategy by Deep Reinforcement Learning0
Optimizing Taxi Carpool Policies via Reinforcement Learning and Spatio-Temporal Mining0
Optimizing Tensor Network Contraction Using Reinforcement Learning0
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning0
Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports0
Optimizing the Long-Term Average Reward for Continuing MDPs: A Technical Report0
Optimizing the Long-Term Behaviour of Deep Reinforcement Learning for Pushing and Grasping0
Optimizing Traffic Lights with Multi-agent Deep Reinforcement Learning and V2X communication0
Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning0
Optimizing Wireless Discontinuous Reception via MAC Signaling Learning0
Option Compatible Reward Inverse Reinforcement Learning0
Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering0
Show:102550
← PrevPage 348 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified