SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1260112650 of 15113 papers

TitleStatusHype
Don't Forget Your Teacher: A Corrective Reinforcement Learning Framework0
Effective Medical Test Suggestions Using Deep Reinforcement Learning0
Combating the Compounding-Error Problem with a Multi-step Model0
Advantage Amplification in Slowly Evolving Latent-State Environments0
An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient0
Linear interpolation gives better gradients than Gaussian smoothing in derivative-free optimization0
Reinforcement Learning with Policy Mixture Model for Temporal Point Processes Clustering0
CopyCAT: Taking Control of Neural Policies with Constant Attacks0
On the Generalization Gap in Reparameterizable Reinforcement Learning0
Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology0
Switching Linear Dynamics for Variational Bayes Filtering0
Variance Reduction for Evolution Strategies via Structured Control Variates0
A General Markov Decision Process Framework for Directly Learning Optimal Control Policies0
Conditions on Features for Temporal Difference-Like Methods to Converge0
Beyond Exponentially Discounted Sum: Automatic Learning of Return Function0
Generation of Policy-Level Explanations for Reinforcement Learning0
Interactive Teaching Algorithms for Inverse Reinforcement Learning0
Learning robust control for LQR systems with multiplicative noise via policy gradientCode0
Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning0
Finite-Sample Analysis of Nonlinear Stochastic Approximation with Applications in Reinforcement LearningCode0
Explainable Reinforcement Learning Through a Causal LensCode0
Disentangling Dynamics and Returns: Value Function Decomposition with Future Prediction0
AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning0
Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement LearningCode0
Policy Search by Target Distribution Learning for Continuous Control0
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy PoliciesCode0
Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities0
Selective Transfer with Reinforced Transfer Network for Partial Domain Adaptation0
Variational Bayes: A report on approaches and applications0
Safe Reinforcement Learning with Nonlinear Dynamics via Model Predictive ShieldingCode0
Composing Task-Agnostic Policies with Deep Reinforcement Learning0
Prioritized Sequence Experience Replay0
Learning to Reason in Large Theories without Imitation0
Transferable Cost-Aware Security Policy Implementation for Malware Detection Using Deep Reinforcement Learning0
A Kernel Loss for Solving the Bellman EquationCode0
Exploration via Flow-Based Intrinsic RewardsCode0
A Dual Reinforcement Learning Framework for Unsupervised Text Style TransferCode0
Adaptive Symmetric Reward Noising for Reinforcement LearningCode0
RL4health: Crowdsourcing Reinforcement Learning for Knee Replacement Pathway Optimization0
Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar0
MQLV: Optimal Policy of Money Management in Retail Banking with Q-Learning0
InfoRL: Interpretable Reinforcement Learning using Information Maximization0
Continual Reinforcement Learning in 3D Non-stationary EnvironmentsCode0
A Micro-Objective Perspective of Reinforcement Learning0
Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound0
PAC Guarantees for Cooperative Multi-Agent Reinforcement Learning with Restricted Communication0
Population-based Global Optimisation Methods for Learning Long-term Dependencies with RNNs0
Multi-hop Reading Comprehension via Deep Reinforcement Learning based Document TraversalCode0
Recurrent Value Functions0
Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable PoliciesCode0
Show:102550
← PrevPage 253 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified