SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 501550 of 15113 papers

TitleStatusHype
StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for Large Language ModelsCode1
Retrieval-Augmented Decision Transformer: External Memory for In-context RLCode1
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement LearningCode1
GreenLight-Gym: Reinforcement learning benchmark environment for control of greenhouse production systemsCode1
Predictive Coding for Decision TransformerCode1
Mitigating Adversarial Perturbations for Deep Reinforcement Learning via Vector QuantizationCode1
ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AICode1
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model PretrainingCode1
CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language ModelsCode1
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement LearningCode1
DMC-VB: A Benchmark for Representation Learning for Control with Visual DistractorsCode1
Reinforcement Learning-based Model Predictive Control for Greenhouse Climate ControlCode1
Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic SystemsCode1
Enhancing RL Safety with Counterfactual LLM ReasoningCode1
AnyBipe: An End-to-End Framework for Training and Deploying Bipedal Robots Guided by Large Language ModelsCode1
Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory controlCode1
What makes math problems hard for reinforcement learning: a case studyCode1
Control-Informed Reinforcement Learning for Chemical ProcessesCode1
Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct OptimizationCode1
Fine-tuning LLMs for Autonomous Spacecraft Control: A Case Study Using Kerbal Space ProgramCode1
Integrating Saliency Ranking and Reinforcement Learning for Enhanced Object DetectionCode1
Listwise Reward Estimation for Offline Preference-based Reinforcement LearningCode1
Model-Based Transfer Learning for Contextual Reinforcement LearningCode1
RELIEF: Reinforcement Learning Empowered Graph Feature Prompt TuningCode1
Visual Grounding for Object-Level Generalization in Reinforcement LearningCode1
Collision Probability Distribution Estimation via Temporal Difference LearningCode1
Reinforcement Learning Pair Trading: A Dynamic Scaling approachCode1
OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement LearningCode1
Learning Goal-Conditioned Representations for Language Reward ModelsCode1
Variable-Agnostic Causal Exploration for Reinforcement LearningCode1
Chip Placement with Diffusion ModelsCode1
Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement LearningCode1
Reinforcement Learning in High-frequency Market MakingCode1
A Benchmark Environment for Offline Reinforcement Learning in Racing GamesCode1
Transductive Active Learning with Application to Safe Bayesian OptimizationCode1
Can Learned Optimization Make Reinforcement Learning Less Difficult?Code1
Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot NavigationCode1
Hindsight Preference Learning for Offline Preference-based Reinforcement LearningCode1
RobocupGym: A challenging continuous control benchmark in RobocupCode1
PUZZLES: A Benchmark for Neural Algorithmic ReasoningCode1
Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial OptimizationCode1
Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function FactorizationCode1
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-FoldCode1
Discovering Minimal Reinforcement Learning EnvironmentsCode1
Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement LearningCode1
ICU-Sepsis: A Benchmark MDP Built from Real Medical DataCode1
HackAtari: Atari Learning Environments for Robust and Continual Reinforcement LearningCode1
Strategically Conservative Q-LearningCode1
Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement LearningCode1
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
Show:102550
← PrevPage 11 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified