SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1330113350 of 15113 papers

TitleStatusHype
Run, skeleton, run: skeletal model in a physics-based simulationCode0
Unsupervised Reinforcement Learning in Multiple EnvironmentsCode0
PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided ExplorationCode0
MMaDA: Multimodal Large Diffusion Language ModelsCode0
Unsupervised Representation Learning in Deep Reinforcement Learning: A ReviewCode0
S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement LearningCode0
Paying Attention to Function WordsCode0
Multi-hop Reading Comprehension via Deep Reinforcement Learning based Document TraversalCode0
Regularized Anderson Acceleration for Off-Policy Deep Reinforcement LearningCode0
The Value of Planning for Infinite-Horizon Model Predictive ControlCode0
StarCraft II: A New Challenge for Reinforcement LearningCode0
Regularization Matters in Policy OptimizationCode0
Unsupervised Reward Shaping for a Robotic Sequential Picking Task from Visual Observations in a Logistics ScenarioCode0
Normalization Enhances Generalization in Visual Reinforcement LearningCode0
Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement LearningCode0
Safe and Efficient Off-Policy Reinforcement LearningCode0
StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer LearningCode0
Safe and Robust Experience Sharing for Deterministic Policy Gradient AlgorithmsCode0
PathNet: Evolution Channels Gradient Descent in Super Neural NetworksCode0
Safe and Sample-efficient Reinforcement Learning for Clustered Dynamic EnvironmentsCode0
STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language ModelsCode0
Safe Chance Constrained Reinforcement Learning for Batch Process ControlCode0
STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMsCode0
Safe Continuous Control with Constrained Model-Based Policy OptimizationCode0
Verifiable and Compositional Reinforcement Learning SystemsCode0
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and PracticeCode0
No Press Diplomacy: Modeling Multi-Agent GameplayCode0
MAD: A Magnitude And Direction Policy Parametrization for Stability Constrained Reinforcement LearningCode0
Unsupervised Task Clustering for Multi-Task Reinforcement LearningCode0
Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive RewardsCode0
Non-zero-sum Game Control for Multi-vehicle Driving via Reinforcement LearningCode0
Unsupervised Attention Mechanism across Neural Network LayersCode0
Non-Stationary Markov Decision Processes, a Worst-Case Approach using Model-Based Reinforcement LearningCode0
Mixture-of-Variational-Experts for Continual LearningCode0
Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex ObjectivesCode0
Think-J: Learning to Think for Generative LLM-as-a-JudgeCode0
Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance LearningCode0
Regret Minimization for Partially Observable Deep Reinforcement LearningCode0
Regret Minimization Experience Replay in Off-Policy Reinforcement LearningCode0
Safe, Efficient, and Comfortable Velocity Control based on Reinforcement Learning for Autonomous DrivingCode0
Nonlinear Inverse Reinforcement Learning with Gaussian ProcessesCode0
Reinforcement learning with non-ergodic reward increments: robustness via ergodicity transformationsCode0
DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement LearningCode0
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language ModelsCode0
Kernel-Based Reinforcement Learning: A Finite-Time AnalysisCode0
Partially Observable Residual Reinforcement Learning for PV-Inverter-Based Voltage Control in Distribution GridsCode0
Park: An Open Platform for Learning-Augmented Computer SystemsCode0
Stateful active facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement LearningCode0
State of the Art Control of Atari Games Using Shallow Reinforcement LearningCode0
Safe Exploration Method for Reinforcement Learning under Existence of DisturbanceCode0
Show:102550
← PrevPage 267 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified