SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1385113900 of 15113 papers

TitleStatusHype
Deep Communicating Agents for Abstractive Summarization0
World ModelsCode1
Scalable photonic reinforcement learning by time-division multiplexing of laser chaos0
Autonomous Ramp Merge Maneuver Based on Reinforcement Learning with Continuous Action Space0
The Importance of Constraint Smoothness for Parameter Estimation in Computational Cognitive Modeling0
Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation0
Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft0
DOP: Deep Optimistic Planning with Approximate Value Function Evaluation0
Learning State Representations for Query Optimization with Deep Reinforcement Learning0
Neuronal Circuit PoliciesCode0
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language NavigationCode0
End-to-End Video Captioning with Multitask Reinforcement LearningCode0
Learning Robotic Assembly from CAD0
Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines0
Natural Gradient Deep Q-learning0
Meta Reinforcement Learning with Latent Variable Gaussian Processes0
Optimizing Sponsored Search Ranking Strategy by Deep Reinforcement Learning0
Composable Deep Reinforcement Learning for Robotic ManipulationCode0
Simple random search provides a competitive approach to reinforcement learningCode1
Setting up a Reinforcement Learning Task with a Real-World RobotCode0
Automated Curriculum Learning by Rewarding Temporally Rare EventsCode0
Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning0
Neural Text Generation: Past, Present and Beyond0
Measurement-based adaptation protocol with quantum reinforcement learning0
Automated Speed and Lane Change Decision Making using Deep Reinforcement Learning0
Imitation Learning with Concurrent Actions in 3D Games0
Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies0
Learning to Explore with Meta-Policy Gradient0
Active Reinforcement Learning with Monte-Carlo Tree Search0
Policy Search in Continuous Action Domains: an Overview0
Soft-Robust Actor-Critic Policy-Gradient0
Deep reinforcement learning for time series: playing idealized trading gamesCode0
Kickstarting Deep Reinforcement Learning0
Variance Networks: When Expectation Does Not Meet Your ExpectationsCode0
SA-IGA: A Multiagent Reinforcement Learning Method Towards Socially Optimal Outcomes0
Feudal Reinforcement Learning for Dialogue Management in Large Domains0
A Multi-Objective Deep Reinforcement Learning Framework0
DeepCAS: A Deep Reinforcement Learning Algorithm for Control-Aware Scheduling0
A Brandom-ian view of Reinforcement Learning towards strong-AI0
Accelerated Methods for Deep Reinforcement LearningCode2
Extracting Action Sequences from Texts Based on Deep Reinforcement Learning0
Intent-aware Multi-agent Reinforcement Learning0
Personalized Exposure Control Using Adaptive Metering and Reinforcement Learning0
Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement LearningCode0
Smoothed Action Value Functions for Learning Gaussian Policies0
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs0
OIL: Observational Imitation Learning0
Some Considerations on Learning to Explore via Meta-Reinforcement LearningCode0
Model-Free Control for Distributed Stream Data Processing using Deep Reinforcement Learning0
Distributed Prioritized Experience ReplayCode3
Show:102550
← PrevPage 278 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified