SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 53015325 of 15113 papers

TitleStatusHype
Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning0
Reward Design in Cooperative Multi-agent Reinforcement Learning for Packet Routing0
Reward-Directed Score-Based Diffusion Models via q-Learning0
Reward Estimation via State Prediction0
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward0
Reward-Free Attacks in Multi-Agent Reinforcement Learning0
Reward-Free Exploration for Reinforcement Learning0
Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation0
Reward-Free Policy Space Compression for Reinforcement Learning0
Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes0
Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning0
Reward Function Optimization of a Deep Reinforcement Learning Collision Avoidance System0
Reward Gaming in Conditional Text Generation0
Task Aware Dreamer for Task Generalization in Reinforcement Learning0
Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models0
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning0
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning0
Rewarding Semantic Similarity under Optimized Alignments for AMR-to-Text Generation0
Rewarding Smatch: Transition-Based AMR Parsing with Reinforcement Learning0
Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue0
Reward is enough for convex MDPs0
Reward Is Enough: LLMs Are In-Context Reinforcement Learners0
Reward is not enough: can we liberate AI from the reinforcement learning paradigm?0
Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery0
Reward Learning using Structural Motifs in Inverse Reinforcement Learning0
Show:102550
← PrevPage 213 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified