SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 38013825 of 15113 papers

TitleStatusHype
APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy OptimizationCode0
From Two-Dimensional to Three-Dimensional Environment with Q-Learning: Modeling Autonomous Navigation with Reinforcement Learning and no LibrariesCode0
Imitating from auxiliary imperfect demonstrations via Adversarial Density Weighted RegressionCode0
Deep Reinforcement Learning that MattersCode0
Hierarchical Potential-based Reward Shaping from Task SpecificationsCode0
Boosting Exploration in Multi-Task Reinforcement Learning using Adversarial NetworksCode0
Fully Convolutional Network with Multi-Step Reinforcement Learning for Image ProcessingCode0
From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMsCode0
CHEQ-ing the Box: Safe Variable Impedance Learning for Robotic PolishingCode0
APEX: Empowering LLMs with Physics-Based Task Planning for Real-time InsightCode0
From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal LikelihoodCode0
APES: a Python toolbox for simulating reinforcement learning environmentsCode0
Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit ThreadsCode0
ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity?Code0
Frequentist Regret Bounds for Randomized Least-Squares Value IterationCode0
Deep Reinforcement Learning with a Natural Language Action SpaceCode0
From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence PredictionCode0
From Gameplay to Symbolic Reasoning: Learning SAT Solver Heuristics in the Style of Alpha(Go) ZeroCode0
FREED++: Improving RL Agents for Fragment-Based Molecule Generation by Thorough ReproductionCode0
Adjust Planning Strategies to Accommodate Reinforcement Learning AgentsCode0
Free energy-based reinforcement learning using a quantum processorCode0
Action Priors for Large Action Spaces in RoboticsCode0
Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement LearningCode0
Free-Lunch Saliency via Attention in Atari AgentsCode0
From Images to Connections: Can DQN with GNNs learn the Strategic Game of Hex?Code0
Show:102550
← PrevPage 153 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified