SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 44264450 of 15113 papers

TitleStatusHype
SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree SearchCode0
Meta Reinforcement Learning with Finite Training Tasks -- a Density Estimation ApproachCode0
Near-Optimal Representation Learning for Hierarchical Reinforcement LearningCode0
Using machine learning to inform harvest control rule design in complex fishery settingsCode0
Reinforcement learning based process optimization and strategy development in conventional tunnelingCode0
MDP environments for the OpenAI GymCode0
Using Natural Language and Program Abstractions to Instill Human Inductive Biases in MachinesCode0
Using Natural Language for Reward Shaping in Reinforcement LearningCode0
Using Offline Data to Speed Up Reinforcement Learning in Procedurally Generated EnvironmentsCode0
SATURN: SAT-based Reinforcement Learning to Unleash Language Model ReasoningCode0
Using reinforcement learning to find an optimal set of featuresCode0
Using reinforcement learning to improve drone-based inference of greenhouse gas fluxesCode0
Using reinforcement learning to learn how to play text-based gamesCode0
Scalable agent alignment via reward modeling: a research directionCode0
MDPGT: Momentum-based Decentralized Policy Gradient TrackingCode0
Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement LearningCode0
Policy-GNN: Aggregation Optimization for Graph Neural NetworksCode0
Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement LearningCode0
USPR: Learning a Unified Solver for Profiled RoutingCode0
Multi-Agent Common Knowledge Reinforcement LearningCode0
Decomposition Methods with Deep Corrections for Reinforcement LearningCode0
Scalable Coordinated Exploration in Concurrent Reinforcement LearningCode0
Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic SatisfactionCode0
Reinforcement Learning-based Token Pruning in Vision Transformers: A Markov Game ApproachCode0
Scalable Evaluation of Online Facilitation Strategies via Synthetic Simulation of DiscussionsCode0
Show:102550
← PrevPage 178 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified