SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 15011550 of 15113 papers

TitleStatusHype
A Reinforcement Learning Engine with Reduced Action and State Space for Scalable Cyber-Physical Optimal Response0
Improved Off-policy Reinforcement Learning in Biological Sequence DesignCode0
DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL0
Improving Portfolio Optimization Results with Bandit NetworksCode0
Spatial-aware decision-making with ring attractors in reinforcement learning systems0
Predictive Coding for Decision TransformerCode1
Mitigating Adversarial Perturbations for Deep Reinforcement Learning via Vector QuantizationCode1
CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character controlCode3
Solving Reach-Avoid-Stay Problems Using Deep Deterministic Policy Gradients0
Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping0
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AICode1
Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments0
Cross-Embodiment Dexterous Grasping with Reinforcement Learning0
End-to-end Driving in High-Interaction Traffic Scenarios with Reinforcement Learning0
Dual Active Learning for Reinforcement Learning from Human Feedback0
Beyond Expected Returns: A Policy Gradient Algorithm for Cumulative Prospect Theoretic Reinforcement Learning0
The Smart Buildings Control Suite: A Diverse Open Source Benchmark to Evaluate and Scale HVAC Control Policies for Sustainability0
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization0
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL0
LLM-Augmented Symbolic Reinforcement Learning with Landmark-Based Task Decomposition0
Adaptive teachers for amortized samplersCode0
Sampling from Energy-based Policies using Diffusion0
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit AssignmentCode2
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models0
Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space0
Scalable Reinforcement Learning-based Neural Architecture Search0
PreND: Enhancing Intrinsic Motivation in Reinforcement Learning through Pre-trained Network Distillation0
Absolute State-wise Constrained Policy Optimization: High-Probability State-wise Constraints Satisfaction0
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model PretrainingCode1
Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning0
Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner0
Personalisation via Dynamic Policy Fusion0
Focus On What Matters: Separated Models For Visual-Based RL Generalization0
Analysis on Riemann Hypothesis with Cross Entropy Optimization and Reasoning0
Constrained Reinforcement Learning for Safe Heat Pump ControlCode0
Grounded Curriculum Learning0
Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization0
Learning to Bridge the Gap: Efficient Novelty Recovery with Planning and Reinforcement Learning0
Strongly-polynomial time and validation analysis of policy gradient methods0
Climate Adaptation with Reinforcement Learning: Experiments with Flooding and Transportation in CopenhagenCode0
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement LearningCode1
Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning0
TemporalPaD: a reinforcement-learning framework for temporal feature representation and dimension reduction0
CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language ModelsCode1
Cost-Aware Dynamic Cloud Workflow Scheduling using Self-Attention and Evolutionary Reinforcement Learning0
Optimizing Downlink C-NOMA Transmission with Movable Antennas: A DDPG-based Approach0
DMC-VB: A Benchmark for Representation Learning for Control with Visual DistractorsCode1
LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots0
Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards0
Asynchronous Fractional Multi-Agent Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing0
Show:102550
← PrevPage 31 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified