SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1455114600 of 15113 papers

TitleStatusHype
Learning by Playing - Solving Sparse Reward Tasks from ScratchCode0
Angrier Birds: Bayesian reinforcement learningCode0
DCUR: Data Curriculum for Teaching via Samples with Reinforcement LearningCode0
CAGES: Cost-Aware Gradient Entropy Search for Efficient Local Multi-Fidelity Bayesian OptimizationCode0
Implicit Quantile Networks for Distributional Reinforcement LearningCode0
Data Valuation using Reinforcement LearningCode0
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement LearningCode0
Ask the Right Questions: Active Question Reformulation with Reinforcement LearningCode0
Efficient Reinforcement Learning for StarCraft by Abstract Forward Models and Transfer LearningCode0
Live in the Moment: Learning Dynamics Model Adapted to Evolving PolicyCode0
General Policy Evaluation and Improvement by Learning to Identify Few But Crucial StatesCode0
General policy mapping: online continual reinforcement learning inspired on the insect brainCode0
C-3PO: Cyclic-Three-Phase Optimization for Human-Robot Motion Retargeting based on Reinforcement LearningCode0
Adaptive Risk-Aware Bidding with Budget Constraint in Display AdvertisingCode0
Efficient Reward Poisoning Attacks on Online Deep Reinforcement LearningCode0
Efficient Ridesharing Dispatch Using Multi-Agent Reinforcement LearningCode0
Data sharing gamesCode0
A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog SystemsCode0
Building Persona Consistent Dialogue Agents with Offline Reinforcement LearningCode0
Importance Prioritized Policy DistillationCode0
Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across ModalitiesCode0
Depth Self-Optimized Learning Toward Data ScienceCode0
Generating Classical Chinese Poems from Vernacular ChineseCode0
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data CoverageCode0
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement LearningCode0
Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and RegularizationCode0
An Evaluation Study of Intrinsic Motivation Techniques applied to Reinforcement Learning over Hard Exploration EnvironmentsCode0
A neurally plausible model learns successor representations in partially observable environmentsCode0
Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team CompetitionCode0
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive ControlCode0
Learning Scheduling Algorithms for Data Processing ClustersCode0
Brick Tic-Tac-Toe: Exploring the Generalizability of AlphaZero to Novel Test EnvironmentsCode0
Efficient time stepping for numerical integration using reinforcement learningCode0
Efficient Transformer-based Hyper-parameter Optimization for Resource-constrained IoT EnvironmentsCode0
Generating Multi-type Temporal Sequences to Mitigate Class-imbalanced ProblemCode0
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy AdaptationCode0
Data-Efficient Off-Policy Policy Evaluation for Reinforcement LearningCode0
Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality TighteningCode0
Data-Efficient Hierarchical Reinforcement LearningCode0
Bregman Gradient Policy OptimizationCode0
Learning Complex Teamwork Tasks Using a Given Sub-task DecompositionCode0
Data driven approach towards more efficient Newton-Raphson power flow calculation for distribution gridsCode0
Data center cooling using model-predictive controlCode0
Data Assimilation in Chaotic Systems Using Deep Reinforcement LearningCode0
Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language ModelsCode0
Ego-Pose Estimation and Forecasting as Real-Time PD ControlCode0
Adaptive Reward Design for Reinforcement LearningCode0
DARLR: Dual-Agent Offline Reinforcement Learning for Recommender Systems with Dynamic RewardCode0
Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative SamplingCode0
BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement LearningCode0
Show:102550
← PrevPage 292 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified