SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1255112600 of 15113 papers

TitleStatusHype
Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP0
Risk-Sensitive Compact Decision Trees for Autonomous Execution in Presence of Simulated Market Response0
Probabilistic hypergraph grammars for efficient molecular optimization0
Reinforcement Learning When All Actions are Not Always AvailableCode0
Measurement-based Online Available Bandwidth Estimation employing Reinforcement Learning0
Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning0
Deep Q-Learning for Directed Acyclic Graph Generation0
Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient Algorithm0
Autonomous Reinforcement Learning of Multiple Interrelated Tasks0
Off-Policy Evaluation via Off-Policy Classification0
Simultaneous Translation with Flexible Policy via Restricted Imitation Learning0
On-board Deep Q-Network for UAV-assisted Online Power Transfer and Data Collection0
Reinforcement Learning with Low-Complexity Liquid State MachinesCode0
Options as responses: Grounding behavioural hierarchies in multi-agent RL0
Posterior Variance Analysis of Gaussian Processes with Application to Average Learning Curves0
Robust exploration in linear quadratic reinforcement learningCode0
Sequential Triggers for Watermarking of Deep Reinforcement Learning Policies0
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning0
RL-Based Method for Benchmarking the Adversarial Resilience and Robustness of Deep Reinforcement Learning Policies0
Proximal Reliability Optimization for Reinforcement Learning0
Adversarial Exploitation of Policy Imitation0
Learning to solve the credit assignment problemCode0
Load Balancing for Ultra-Dense Networks: A Deep Reinforcement Learning Based Approach0
Decentralized Deep Reinforcement Learning for Delay-Power Tradeoff in Vehicular Communications0
A Semi-Supervised Approach for Low-Resourced Text GenerationCode0
Deep Reinforcement Learning Architecture for Continuous Power Allocation in High Throughput Satellites0
Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement LearningCode0
On the Correctness and Sample Complexity of Inverse Reinforcement LearningCode0
Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual NavigationCode0
Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints0
Automated Video Game Testing Using Synthetic and Human-Like Agents0
An Empirical Study on Hyperparameters and their Interdependence for RL Generalization0
The Principle of Unchanged Optimality in Reinforcement Learning Generalization0
Enhanced Bayesian Compression via Deep Reinforcement Learning0
Exploiting Noisy Data in Distant Supervision Relation Classification0
Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model0
Harnessing Reinforcement Learning for Neural Motion PlanningCode0
Decision-Making in Reinforcement Learning0
Safety Augmented Value Estimation from Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic TasksCode0
Attentional Policies for Cross-Context Multi-Agent Reinforcement Learning0
Interval timing in deep reinforcement learning agentsCode0
Sequence Modeling of Temporal Credit Assignment for Episodic Reinforcement LearningCode0
Rewarding Smatch: Transition-Based AMR Parsing with Reinforcement Learning0
Reinforcement Learning Experience Reuse with Policy Residual Representation0
Towards Finding Longer ProofsCode0
Reinforcement Learning and Adaptive Sampling for Optimized DNN CompilationCode0
Reinforcement Learning for Mean Field Game0
On Value Functions and the Agent-Environment Boundary0
Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator0
Defining Admissible Rewards for High Confidence Policy Evaluation0
Show:102550
← PrevPage 252 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified