SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 68016825 of 15113 papers

TitleStatusHype
Tractable Representations for Convergent Approximation of Distributional HJB Equations0
TradeR: Practical Deep Hierarchical Reinforcement Learning for Trade Execution0
Trading the Twitter Sentiment with Reinforcement Learning0
Traffic Co-Simulation Framework Empowered by Infrastructure Camera Sensing and Reinforcement Learning0
Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning0
Traffic Management of Autonomous Vehicles using Policy Based Deep Reinforcement Learning and Intelligent Routing0
Traffic Signal Control with Communicative Deep Reinforcement Learning Agents: a Case Study0
Train a snake with reinforcement learning algorithms0
Training a Constrained Natural Media Painting Agent using Reinforcement Learning0
Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models0
Training Characteristic Functions with Reinforcement Learning: XAI-methods play Connect Four0
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning0
Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference0
Training in Task Space to Speed Up and Guide Reinforcement Learning0
Training Language Models to Critique With Multi-agent Feedback0
Training Large Language Models to Reason via EM Policy Gradient0
Training Larger Networks for Deep Reinforcement Learning0
Training like Playing: A Reinforcement Learning And Knowledge Graph-based framework for building Automatic Consultation System in Medical Field0
LeDex: Training LLMs to Better Self-Debug and Explain Code0
Training Reinforcement Learning Agents and Humans With Difficulty-Conditioned Generators0
Training Value-Aligned Reinforcement Learning Agents Using a Normative Prior0
Trajectory-based Learning for Ball-in-Maze Games0
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning0
Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q^π-Realizability and Concentrability0
Trajectory First: A Curriculum for Discovering Diverse Policies0
Show:102550
← PrevPage 273 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified