SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1265112700 of 15113 papers

TitleStatusHype
Steering Your Diffusion Policy with Latent Space Reinforcement Learning0
Stein Variational Goal Generation for adaptive Exploration in Multi-Goal Reinforcement Learning0
Stein Variational Policy Gradient0
Stepping Out of the Shadows: Reinforcement Learning in Shadow Mode0
Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs0
Stigmergic Independent Reinforcement Learning for Multi-Agent Collaboration0
Stochastically Dominant Distributional Reinforcement Learning0
Stochastic Approximation of Gaussian Free Energy for Risk-Sensitive Reinforcement Learning0
Stochastic Approximation with Markov Noise: Analysis and applications in reinforcement learning0
Stochastic Constraint Programming as Reinforcement Learning0
Stochastic convex optimization for provably efficient apprenticeship learning0
Stochastic evolution in populations of ideas0
Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning0
Black-box Optimizer with Implicit Natural Gradient0
Stochastic Intervention for Causal Inference via Reinforcement Learning0
Stochastic Inverse Reinforcement Learning0
Stochastic Inverse Reinforcement Learning0
Stochastic Learning Approach to Binary Optimization for Optimal Design of Experiments0
Stochastic Lipschitz Q-Learning0
Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning0
Stochastic Q-learning for Large Discrete Action Spaces0
Stochastic Reinforcement Learning0
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function0
Stochastic Variance Reduction for Deep Q-learning0
Stochastic Variance Reduction for Policy Gradient Estimation0
Stochastic Variance Reduction Methods for Policy Evaluation0
Stock market microstructure inference via multi-agent reinforcement learning0
Stock Trading Optimization through Model-based Reinforcement Learning with Resistance Support Relative Strength0
Model Based Reinforcement Learning with Non-Gaussian Environment Dynamics and its Application to Portfolio Optimization0
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL0
Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning0
Story Shaping: Teaching Agents Human-like Behavior with Stories0
Straight to the point: reinforcement learning for user guidance in ultrasound0
Strategically Linked Decisions in Long-Term Planning and Reinforcement Learning0
Strategically-timed State-Observation Attacks on Deep Reinforcement Learning Agents0
Strategic bidding in freight transport using deep reinforcement learning0
Strategic Maneuver and Disruption with Reinforcement Learning Approaches for Multi-Agent Coordination0
Optimizing Trading Strategies in Quantitative Markets using Multi-Agent Reinforcement Learning0
Strategies for Using Proximal Policy Optimization in Mobile Puzzle Games0
Strategising template-guided needle placement for MR-targeted prostate biopsy0
Strategy and Benchmark for Converting Deep Q-Networks to Event-Driven Spiking Neural Networks0
Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning0
Stratified Expert Cloning with Adaptive Selection for User Retention in Large-Scale Recommender Systems0
Stratospheric Aerosol Injection as a Deep Reinforcement Learning Problem0
Streaming Linear System Identification with Reverse Experience Replay0
Streaming Traffic Flow Prediction Based on Continuous Reinforcement Learning0
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation0
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning0
S-TRIGGER: Continual State Representation Learning via Self-Triggered Generative Replay0
Striving for Simplicity in Off-Policy Deep Reinforcement Learning0
Show:102550
← PrevPage 254 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified