SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1275112800 of 15113 papers

TitleStatusHype
A Theoretical Understanding of Gradient Bias in Meta-Reinforcement LearningCode0
On the Design of Safe Continual RL Methods for Control of Nonlinear SystemsCode0
On the Challenges of using Reinforcement Learning in Precision Drug Dosing: Delay and Prolongedness of Action EffectsCode0
Replacing Rewards with Examples: Example-Based Policy Search via Recursive ClassificationCode0
TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement LearningCode0
TD-Regularized Actor-Critic MethodsCode0
ReInform: Selecting paths with reinforcement learning for contextualized link predictionCode0
Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement LearningCode0
On the calibration of compartmental epidemiological modelsCode0
Replication of Impedance Identification Experiments on a Reinforcement-Learning-Controlled Digital Twin of Human ElbowsCode0
Teach Biped Robots to Walk via Gait Principles and Reinforcement Learning with Adversarial CriticsCode0
Reinforcement Replaces Supervision: Query focused Summarization using Deep Reinforcement LearningCode0
Project proposal: A modular reinforcement learning based automated theorem proverCode0
SFV: Reinforcement Learning of Physical Skills from VideosCode0
Understanding the Evolution of Linear Regions in Deep Reinforcement LearningCode0
Shapechanger: Environments for Transfer LearningCode0
On Solving the 2-Dimensional Greedy Shooter Problem for UAVsCode0
Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of ExperiencesCode0
Shaping Advice in Deep Multi-Agent Reinforcement LearningCode0
Shaping Advice in Deep Reinforcement LearningCode0
On Practical Reinforcement Learning: Provable Robustness, Scalability, and Statistical EfficiencyCode0
Representation Learning for Grounded Spatial ReasoningCode0
Teaching a Machine to Read Maps with Deep Reinforcement LearningCode0
Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement LearningCode0
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language UseCode0
Shapley Machine: A Game-Theoretic Framework for N-Agent Ad Hoc TeamworkCode0
Shared Autonomy via Deep Reinforcement LearningCode0
Progressive Neural Architecture SearchCode0
Understanding the impact of entropy on policy optimizationCode0
Reinforcement Learning with Unsupervised Auxiliary TasksCode0
Towards optimized actions in critical situations of soccer games with deep reinforcement learningCode0
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous ControlCode0
Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over DropoutCode0
TEAC: Intergrating Trust Region and Max Entropy Actor Critic for Continuous ControlCode0
Combining Reinforcement Learning and Tensor Networks, with an Application to Dynamical Large DeviationsCode0
Probing the Robustness of Trained Metrics for Conversational Dialogue SystemsCode0
On-Policy Trust Region Policy Optimisation with Replay BuffersCode0
TeaMs-RL: Teaching LLMs to Generate Better Instruction Datasets via Reinforcement LearningCode0
Reinforcement Learning with Success Induced Task PrioritizationCode0
Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement LearningCode0
Reset-free Trial-and-Error Learning for Robot Damage RecoveryCode0
Value Prediction NetworkCode0
Shortest Edit Path Crossover: A Theory-driven Solution to the Permutation Problem in Evolutionary Neural Architecture SearchCode0
Residual Learning and Context Encoding for Adaptive Offline-to-Online Reinforcement LearningCode0
Probabilistic Counterexample Guidance for Safer Reinforcement Learning (Extended Version)Code0
Residual Loss Prediction: Reinforcement Learning With No Incremental FeedbackCode0
Residual Policy LearningCode0
What Did You Think Would Happen? Explaining Agent Behaviour Through Intended OutcomesCode0
Bridging the Sim-to-Real Gap from the Information Bottleneck PerspectiveCode0
Understanding the Safety Requirements for Learning-based Power Systems OperationsCode0
Show:102550
← PrevPage 256 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified