SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 20512075 of 15113 papers

TitleStatusHype
Diffusion Actor-Critic with Entropy RegulatorCode2
Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning RateCode1
Cross-Domain Policy Adaptation by Capturing Representation MismatchCode1
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree SearchCode1
Blood Glucose Control Via Pre-trained Counterfactual Invertible Neural Networks0
Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality0
PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement LearningCode1
Exclusively Penalized Q-learning for Offline Reinforcement Learning0
Offline Reinforcement Learning from Datasets with Structured Non-StationarityCode0
AGILE: A Novel Reinforcement Learning Framework of LLM AgentsCode2
Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of ExperiencesCode0
A finite time analysis of distributed Q-learning0
Multi-turn Reinforcement Learning from Preference Human FeedbackCode1
Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence0
Variational Delayed Policy OptimizationCode0
Maximum Entropy Reinforcement Learning via Energy-Based Normalizing FlowCode1
Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention0
Learning to sample fibers for goodness-of-fit testing0
Leader Reward for POMO-Based Neural Combinatorial Optimization0
Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systemsCode0
Large Language Models (LLMs) Assisted Wireless Network Deployment in Urban Settings0
Knowledge Graph Reasoning with Self-supervised Reinforcement LearningCode1
HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model0
Multi-Agent Reinforcement Learning with Hierarchical Coordination for Emergency Responder Stationing0
CausalPlayground: Addressing Data-Generation Requirements in Cutting-Edge Causality ResearchCode1
Show:102550
← PrevPage 83 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified