SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1350113550 of 15113 papers

TitleStatusHype
Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality GuaranteesCode0
Structural Design Through Reinforcement LearningCode0
Toward Causal-Aware RL: State-Wise Action-Refined Temporal DifferenceCode0
Sample Efficient Policy Gradient Methods with Recursive Variance ReductionCode0
Toward Collaborative Reinforcement Learning Agents that Communicate Through Text-Based Natural LanguageCode0
Structure and randomness in planning and reinforcement learningCode0
Rate-Splitting for Intelligent Reflecting Surface-Aided Multiuser VR StreamingCode0
Identifying Expert Behavior in Offline Training Datasets Improves Behavioral Cloning of Robotic Manipulation PoliciesCode0
Ranking Sentences for Extractive Summarization with Reinforcement LearningCode0
When to Ask for Help: Proactive Interventions in Autonomous Reinforcement LearningCode0
Structured Control Nets for Deep Reinforcement LearningCode0
Trust, but verify: model-based exploration in sparse reward environmentsCode0
Structured Fusion Networks for DialogCode0
Neural Map: Structured Memory for Deep Reinforcement LearningCode0
Oralytics Reinforcement Learning AlgorithmCode0
Neural Lyapunov Function Approximation with Self-Supervised Reinforcement LearningCode0
Structured Variational Learning of Bayesian Neural Networks with Horseshoe PriorsCode0
Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch SchedulingCode0
Trust Region-Guided Proximal Policy OptimizationCode0
Deep Reinforcement Learning Methods for Structure-Guided Processing Path OptimizationCode0
Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless NetworksCode0
Trust-Region Twisted Policy ImprovementCode0
Ranking Policy GradientCode0
Minimax Regret Bounds for Reinforcement LearningCode0
Structure Mapping for Transferability of Causal ModelsCode0
Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic ControlCode0
Student-Initiated Action Advising via Advice NoveltyCode0
Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide ExplorationCode0
Minimax-Bayes Reinforcement LearningCode0
Toward Policy Explanations for Multi-Agent Reinforcement LearningCode0
OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement LearningCode0
Ranking Policy DecisionsCode0
Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation and Complexity AnalysisCode0
Better-than-Demonstrator Imitation Learning via Automatically-Ranked DemonstrationsCode0
Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial OptimizationCode0
Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global WorkspaceCode0
Constrained Reinforcement Learning using Distributional Representation for Trustworthy Quadrotor UAV Tracking ControlCode0
Option Discovery in the Absence of Rewards with Manifold AnalysisCode0
LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Sparse Reward Iterative TasksCode0
Model-free Quantum Gate Design and Calibration using Deep Reinforcement LearningCode0
SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree SearchCode0
Towards a Common Implementation of Reinforcement Learning for Multiple Robotic TasksCode0
Multi-agent Cooperative Games Using Belief Map Assisted TrainingCode0
Random Projection in Neural Episodic ControlCode0
Neural Logic Reinforcement LearningCode0
Multi-Agent Connected Autonomous Driving using Deep Reinforcement LearningCode0
Randomized Prior Functions for Deep Reinforcement LearningCode0
Random Expert Distillation: Imitation Learning via Expert Policy Support EstimationCode0
SATURN: SAT-based Reinforcement Learning to Unleash Language Model ReasoningCode0
Weakly Supervised Reinforcement Learning for Autonomous Highway Driving via Virtual Safety CagesCode0
Show:102550
← PrevPage 271 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified