SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1175111800 of 15113 papers

TitleStatusHype
Reward Gaming in Conditional Text Generation0
Task Aware Dreamer for Task Generalization in Reinforcement Learning0
Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models0
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning0
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning0
Rewarding Semantic Similarity under Optimized Alignments for AMR-to-Text Generation0
Rewarding Smatch: Transition-Based AMR Parsing with Reinforcement Learning0
Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue0
Reward is enough for convex MDPs0
Reward Is Enough: LLMs Are In-Context Reinforcement Learners0
Reward is not enough: can we liberate AI from the reinforcement learning paradigm?0
Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery0
Reward Learning using Structural Motifs in Inverse Reinforcement Learning0
Rewardless Open-Ended Learning (ROEL)0
Reward Machine Inference for Robotic Manipulation0
Reward (Mis)design for Autonomous Driving0
Reward Poisoning Attacks on Offline Multi-Agent Reinforcement Learning0
Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments0
Reward prediction for representation learning and reward shaping0
Reward-Predictive Clustering0
STIR^2: Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks0
Reward-Respecting Subtasks for Model-Based Reinforcement Learning0
Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning0
Reward Shaping for Reinforcement Learning with Omega-Regular Objectives0
Reward Shaping for User Satisfaction in a REINFORCE Recommender0
Reward Shaping via Diffusion Process in Reinforcement Learning0
Reward Shaping via Meta-Learning0
Reward Shaping with Dynamic Trajectory Aggregation0
Reward Shaping with Subgoals for Social Navigation0
RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation0
Rewards with Negative Examples for Reinforced Topic-Focused Abstractive Summarization0
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective0
Reward Training Wheels: Adaptive Auxiliary Rewards for Robotics Reinforcement Learning0
REX: Rapid Exploration and eXploitation for AI Agents0
ReZero: Enhancing LLM search ability by trying one-more-time0
RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration0
Riemannian Stochastic Gradient Method for Nested Composition Optimization0
RILe: Reinforced Imitation Learning0
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs0
RIS-assisted UAV Communications for IoT with Wireless Power Transfer Using Deep Reinforcement Learning0
RISCLESS: A Reinforcement Learning Strategy to Exploit Unused Cloud Resources0
Risk-Averse Bayes-Adaptive Reinforcement Learning0
Risk-Averse Learning by Temporal Difference Methods0
Risk-averse policies for natural gas futures trading using distributional reinforcement learning0
Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk Measures0
Risk Averse Robust Adversarial Reinforcement Learning0
Risk Averse Value Expansion for Sample Efficient and Robust Policy Learning0
Risk Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search0
Risk-Aware Reinforcement Learning through Optimal Transport Theory0
Risk-Aware Safe Reinforcement Learning for Control of Stochastic Linear Systems0
Show:102550
← PrevPage 236 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified