SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 55515575 of 15113 papers

TitleStatusHype
Designing Rewards for Fast Learning0
Reinforcement Learning with a TerminatorCode0
Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance LearningCode0
Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets0
Residual Q-Networks for Value Function Factorizing in Multi-Agent Reinforcement Learning0
Stock Trading Optimization through Model-based Reinforcement Learning with Resistance Support Relative Strength0
Multi-Agent Reinforcement Learning is a Sequence Modeling ProblemCode2
SEREN: Knowing When to Explore and When to Exploit0
RLx2: Training a Sparse Deep Reinforcement Learning Model from ScratchCode1
Learning Open Domain Multi-hop Search Using Reinforcement Learning0
Efficient Reward Poisoning Attacks on Online Deep Reinforcement LearningCode0
GraMeR: Graph Meta Reinforcement Learning for Multi-Objective Influence Maximization0
Learning Security Strategies through Game Play and Optimal Stopping0
Provable Benefits of Representational Transfer in Reinforcement LearningCode1
On the Robustness of Safe Reinforcement Learning under Observational PerturbationsCode1
Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning0
Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective TrajectoriesCode1
Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming ChallengesCode1
Survival Analysis on Structured Data using Deep Reinforcement Learning0
Multi-Source Transfer Learning for Deep Model-Based Reinforcement Learning0
Tutorial on Course-of-Action (COA) Attack Search Methods in Computer Networks0
Off-Beat Multi-Agent Reinforcement Learning0
Non-Markovian policies occupancy measures0
Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters0
Provably Sample-Efficient RL with Side Information about Latent Dynamics0
Show:102550
← PrevPage 223 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified