SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 57765800 of 15113 papers

TitleStatusHype
Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing0
Scaling Goal-based Exploration via Pruning Proto-goals0
Scaling Intelligent Agents in Combat Simulations for Wargaming0
Scaling Laws for Reward Model Overoptimization0
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms0
Scaling laws for single-agent reinforcement learning0
Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction0
Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles0
Scaling Offline RL via Efficient and Expressive Shortcut Models0
Scaling RL to Long Videos0
Scaling shared model governance via model splitting0
Scaling Test-Time Compute Without Verification or RL is Suboptimal0
Scaling Up Multiagent Reinforcement Learning for Robotic Systems: Learn an Adaptive Sparse Communication Graph0
Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training0
Scaling Up Robust MDPs by Reinforcement Learning0
Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning0
SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II0
SCC-rFMQ Learning in Cooperative Markov Games with Continuous Actions0
Scenario-Assisted Deep Reinforcement Learning0
Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks0
Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments0
Schedule Earth Observation satellites with Deep Reinforcement Learning0
ScheduleNet: Learn to Solve MinMax mTSP Using Reinforcement Learning with Delayed Reward0
Scheduling and Power Control for Wireless Multicast Systems via Deep Reinforcement Learning0
Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement Learning0
Show:102550
← PrevPage 232 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified