SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1340113450 of 15113 papers

TitleStatusHype
Safe Reinforcement Learning in Black-Box Environments via Adaptive ShieldingCode0
RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online AdvertisingCode0
Parameterized Indexed Value Function for Efficient Exploration in Reinforcement LearningCode0
Safe Reinforcement Learning of Control-Affine Systems with Vertex NetworksCode0
StepCountJITAI: simulation environment for RL with application to physical activity adaptive interventionCode0
Parameter-free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy GradientsCode0
Noisy Natural Gradient as Variational InferenceCode0
ReCCoVER: Detecting Causal Confusion for Explainable Reinforcement LearningCode0
Safe Reinforcement Learning Using Black-Box Reachability AnalysisCode0
StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy OptimizationCode0
Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement LearningCode0
Next-Best-View Estimation based on Deep Reinforcement Learning for Active Object ClassificationCode0
reBandit: Random Effects based Online RL algorithm for Reducing Cannabis UseCode0
Safe Reinforcement Learning with Nonlinear Dynamics via Model Predictive ShieldingCode0
Safe Reinforcement Learning via Probabilistic Logic ShieldsCode0
STL-Based Synthesis of Feedback Controllers Using Reinforcement LearningCode0
Safe Reinforcement Learning via ShieldingCode0
Parameter-Based Value FunctionsCode0
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy OptimizationCode0
Stochastic Answer Networks for Machine Reading ComprehensionCode0
Time-R1: Towards Comprehensive Temporal Reasoning in LLMsCode0
Missingness as Stability: Understanding the Structure of Missingness in Longitudinal EHR data and its Impact on Reinforcement Learning in HealthcareCode0
Newton-type Methods for Minimax OptimizationCode0
Newsvendor Model with Deep Reinforcement LearningCode0
Meta-Inverse Reinforcement Learning with Probabilistic Context VariablesCode0
Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic ResetsCode0
Mirror Descent Search and its AccelerationCode0
Urban Driving with Multi-Objective Deep Reinforcement LearningCode0
Neuro-symbolic Natural Logic with Introspective Revision for Natural Language InferenceCode0
Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban EnvironmentsCode0
Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with ApplicationsCode0
Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable ModelCode0
TinyQMIX: Distributed Access Control for mMTC via Multi-agent Reinforcement LearningCode0
Safer Reinforcement Learning through Transferable Instinct NetworksCode0
Neuro-Symbolic Approaches for Text-Based Policy LearningCode0
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language ModelsCode0
Stochastic Neural Networks for Hierarchical Reinforcement LearningCode0
Stochastic optimal well control in subsurface reservoirs using reinforcement learningCode0
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained ModelsCode0
Neuronal Circuit PoliciesCode0
Neurogenetic Programming Framework for Explainable Reinforcement LearningCode0
Multi-Agent Reinforcement Learning for Visibility-based Persistent MonitoringCode0
TreeC: a method to generate interpretable energy management systems using a metaheuristic algorithmCode0
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement LearningCode0
Multiagent Reinforcement Learning based Energy Beamforming ControlCode0
Multi-Agent Reinforcement Learning: A Report on Challenges and ApproachesCode0
Two-step reinforcement learning for model-free redesign of nonlinear optimal regulatorCode0
Model-free reinforcement learning with noisy actions for automated experimental control in opticsCode0
Reasoning and Generalization in RL: A Tool Use PerspectiveCode0
Reasoning about Counterfactuals to Improve Human Inverse Reinforcement LearningCode0
Show:102550
← PrevPage 269 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified