SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1425114300 of 15113 papers

TitleStatusHype
A User Simulator for Task-Completion DialoguesCode0
Deep Reinforcement Learning-based Exploration of Web ApplicationsCode0
Deep Reinforcement Learning: An OverviewCode0
Flight Controller Synthesis Via Deep Reinforcement LearningCode0
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion ModelsCode0
Entropy Regularized Reinforcement Learning Using Large Deviation TheoryCode0
A Unified Framework for Alternating Offline Model Training and Policy LearningCode0
Clipped-Objective Policy Gradients for Pessimistic Policy OptimizationCode0
DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable FeedbackCode0
Hybridising Reinforcement Learning and Heuristics for Hierarchical Directed Arc Routing ProblemsCode0
Climate Adaptation with Reinforcement Learning: Experiments with Flooding and Transportation in CopenhagenCode0
Annealing Optimization for Progressive Learning with Stochastic ApproximationCode0
Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access LocationsCode0
Deep Quality-Value (DQV) LearningCode0
Client Selection for Federated Policy Optimization with Environment HeterogeneityCode0
Hybrid Latent Reasoning via Reinforcement LearningCode0
Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement LearningCode0
Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual NavigationCode0
DeepQTest: Testing Autonomous Driving Systems with Reinforcement Learning and Real-world Weather DataCode0
A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning AlgorithmsCode0
Deep Q-learning from DemonstrationsCode0
Deep Q learning for fooling neural networksCode0
Augmenting Replay in World Models for Continual Reinforcement LearningCode0
Learning to EvolveCode0
Local and Global Explanations of Agent Behavior: Integrating Strategy Summaries with Saliency MapsCode0
Long Short-Term Memory for Spatial Encoding in Multi-Agent Path PlanningCode0
Deep Q-Learning based Reinforcement Learning Approach for Network Intrusion DetectionCode0
DeepPath: A Reinforcement Learning Method for Knowledge Graph ReasoningCode0
Learning Phase Competition for Traffic Signal ControlCode0
Detecting Rewards Deterioration in Episodic Reinforcement LearningCode0
A Hierarchical Framework for Relation Extraction with Reinforcement LearningCode0
DRiLLS: Deep Reinforcement Learning for Logic SynthesisCode0
Adaptive Traffic Control with Deep Reinforcement Learning:Towards State-of-the-art and BeyondCode0
A Semi-Supervised Approach for Low-Resourced Text GenerationCode0
Deep Ordinal Reinforcement LearningCode0
An Investigation of Time Reversal Symmetry in Reinforcement LearningCode0
Learning Actionable Representations with Goal-Conditioned PoliciesCode0
Model-free optimization of power/efficiency tradeoffs in quantum thermal machines using reinforcement learningCode0
Hybrid Reinforcement Learning with Expert State SequencesCode0
Driving in Dense Traffic with Model-Free Reinforcement LearningCode0
Foresee then Evaluate: Decomposing Value Estimation with Latent Future PredictionCode0
Hybrid Reward Architecture for Reinforcement LearningCode0
Driving Reinforcement Learning with ModelsCode0
Interval timing in deep reinforcement learning agentsCode0
Classification with Costly Features using Deep Reinforcement LearningCode0
Deep Object-Centric Representations for Generalizable Robot LearningCode0
Classification with Costly Features as a Sequential Decision-Making ProblemCode0
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN ParametersCode0
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware BudgetingCode0
Accelerated Reinforcement Learning for Sentence Generation by Vocabulary PredictionCode0
Show:102550
← PrevPage 286 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified