SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 64016450 of 15113 papers

TitleStatusHype
Diagnosing Reinforcement Learning for Traffic Signal Control0
Dialog Action-Aware Transformer for Dialog Policy Learning0
Dialogue Evaluation with Offline Reinforcement Learning0
Dialogue manager domain adaptation using Gaussian process reinforcement learning0
Dialogue Shaping: Empowering Agents through NPC Interaction0
DiBB: Distributing Black-Box Optimization0
Dichotomy of Control: Separating What You Can Control from What You Cannot0
Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning0
Difference of Convex Functions Programming Applied to Control with Expert Data0
Difference of Convex Functions Programming for Reinforcement Learning0
Difference Rewards Policy Gradients0
Differentiable Arbitrating in Zero-sum Markov Games0
Differentiable Discrete Event Simulation for Queuing Network Control0
Differentiable Logic Machines0
Differentiable Physics Models for Real-world Offline Model-based Reinforcement Learning0
Differentiable Quantum Architecture Search in Asynchronous Quantum Reinforcement Learning0
Differentially Private Exploration in Reinforcement Learning with Linear Representation0
Differentially Private Deep Model-Based Reinforcement Learning0
Differentially Private Policy Evaluation0
Differentially Private Reinforcement Learning with Linear Function Approximation0
Differential Variable Speed Limits Control for Freeway Recurrent Bottlenecks via Deep Reinforcement learning0
Differentiated Federated Reinforcement Learning Based Traffic Offloading on Space-Air-Ground Integrated Networks0
DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning0
DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools0
DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching0
Diffused Task-Agnostic Milestone Planner0
Diffusion-Based Offline RL for Improved Decision-Making in Augmented ARC Task0
Diffusion-based Reinforcement Learning for Dynamic UAV-assisted Vehicle Twins Migration in Vehicular Metaverses0
Diffusion Models for Smarter UAVs: Decision-Making and Modeling0
Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning0
Diffusion Self-Weighted Guidance for Offline Reinforcement Learning0
Diffusion Spectral Representation for Reinforcement Learning0
Digital Human Interactive Recommendation Decision-Making Based on Reinforcement Learning0
Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks0
Digital Twin-Assisted Efficient Reinforcement Learning for Edge Task Scheduling0
Digital Twin Assisted Risk-Aware Sleep Mode Management Using Deep Q-Networks0
Digital Twin for Autonomous Guided Vehicles based on Integrated Sensing and Communications0
Digital Twin-Native AI-Driven Service Architecture for Industrial Networks0
DiGrad: Multi-Task Reinforcement Learning with Shared Actions0
Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation0
DIMBA: Discretely Masked Black-Box Attack in Single Object Tracking0
Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning0
DINASTI: Dialogues with a Negotiating Appointment Setting Interface0
DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes0
DIP-RL: Demonstration-Inferred Preference Learning in Minecraft0
Direct and indirect reinforcement learning0
Directed Exploration for Reinforcement Learning0
Directed Exploration in PAC Model-Free Reinforcement Learning0
Directed Policy Gradient for Safe Reinforcement Learning with Human Advice0
Revisiting a Design Choice in Gradient Temporal Difference Learning0
Show:102550
← PrevPage 129 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified