SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1245112500 of 15113 papers

TitleStatusHype
Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator0
Effective Medical Test Suggestions Using Deep Reinforcement Learning0
Defining Admissible Rewards for High Confidence Policy Evaluation0
Towards Finding Longer ProofsCode0
Reinforcement Learning and Adaptive Sampling for Optimized DNN CompilationCode0
Variance Reduction for Evolution Strategies via Structured Control Variates0
Advantage Amplification in Slowly Evolving Latent-State Environments0
Linear interpolation gives better gradients than Gaussian smoothing in derivative-free optimization0
An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient0
Reinforcement Learning with Policy Mixture Model for Temporal Point Processes Clustering0
Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology0
CopyCAT: Taking Control of Neural Policies with Constant Attacks0
Switching Linear Dynamics for Variational Bayes Filtering0
On the Generalization Gap in Reparameterizable Reinforcement Learning0
Learning robust control for LQR systems with multiplicative noise via policy gradientCode0
Conditions on Features for Temporal Difference-Like Methods to Converge0
Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement LearningCode1
Beyond Exponentially Discounted Sum: Automatic Learning of Return Function0
A General Markov Decision Process Framework for Directly Learning Optimal Control Policies0
Generation of Policy-Level Explanations for Reinforcement Learning0
Interactive Teaching Algorithms for Inverse Reinforcement Learning0
Snooping Attacks on Deep Reinforcement LearningCode1
Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning0
Finite-Sample Analysis of Nonlinear Stochastic Approximation with Applications in Reinforcement LearningCode0
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy PoliciesCode0
Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities0
Policy Search by Target Distribution Learning for Continuous Control0
SQIL: Imitation Learning via Reinforcement Learning with Sparse RewardsCode1
Explainable Reinforcement Learning Through a Causal LensCode0
Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement LearningCode0
Disentangling Dynamics and Returns: Value Function Decomposition with Future Prediction0
AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning0
Interactive Differentiable SimulationCode2
Selective Transfer with Reinforced Transfer Network for Partial Domain Adaptation0
Variational Bayes: A report on approaches and applications0
Prioritized Sequence Experience Replay0
A Kernel Loss for Solving the Bellman EquationCode0
Transferable Cost-Aware Security Policy Implementation for Malware Detection Using Deep Reinforcement Learning0
Learning to Reason in Large Theories without Imitation0
Adversarial Policies: Attacking Deep Reinforcement LearningCode1
Composing Task-Agnostic Policies with Deep Reinforcement Learning0
Safe Reinforcement Learning with Nonlinear Dynamics via Model Predictive ShieldingCode0
RL4health: Crowdsourcing Reinforcement Learning for Knee Replacement Pathway Optimization0
MQLV: Optimal Policy of Money Management in Retail Banking with Q-Learning0
Exploration via Flow-Based Intrinsic RewardsCode0
InfoRL: Interpretable Reinforcement Learning using Information Maximization0
A Dual Reinforcement Learning Framework for Unsupervised Text Style TransferCode0
A Micro-Objective Perspective of Reinforcement Learning0
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound0
Adaptive Symmetric Reward Noising for Reinforcement LearningCode0
Show:102550
← PrevPage 250 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified