SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1420114250 of 15113 papers

TitleStatusHype
Transferring Agent Behaviors from Videos via Motion GANs0
Posterior Sampling for Large Scale Reinforcement Learning0
Teaching a Machine to Read Maps with Deep Reinforcement LearningCode0
Deep Reinforcement Learning for Multi-Resource Multi-Machine Job Scheduling0
Classification with Costly Features using Deep Reinforcement LearningCode0
Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement LearningCode0
Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction0
Run, skeleton, run: skeletal model in a physics-based simulationCode0
Hindsight policy gradientsCode0
Finding Efficient Swimming Strategies in a Three Dimensional Chaotic Flow by Reinforcement Learning0
Costate-focused models for reinforcement learning0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
Markov Decision Processes with Continuous Side Information0
Variational Adaptive-Newton Method for Explorative Learning0
Saliency-based Sequential Image Attention with Multiset Prediction0
Reinforcement Learning in a large scale photonic Recurrent Neural Network0
Classical Structured Prediction Losses for Sequence to Sequence Learning0
Loss Functions for Multiset Prediction0
Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization0
Applications of Deep Learning and Reinforcement Learning to Biological Data0
Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection0
Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive SummarisationCode0
Worm-level Control through Search-based Reinforcement Learning0
An Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading Comprehension Tasks0
Energy Storage Arbitrage in Real-Time Markets via Reinforcement Learning0
LatentPoison - Adversarial Attacks On The Latent SpaceCode0
Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?Code0
Double Q(σ) and Q(σ, λ): Unifying Reinforcement Learning Control Algorithms0
Composing Meta-Policies for Autonomous Driving Using Hierarchical Deep Reinforcement Learning0
Policy Optimization by Genetic Distillation0
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning0
Adaptive coordination of working-memory and reinforcement learning in non-human primates performing a trial-and-error problem solving taskCode0
Automatic Text Summarization Using Reinforcement Learning with Embedding Features0
Intelligent Parameter Tuning in Optimization-based Iterative CT Reconstruction via Deep Reinforcement Learning0
Learning to Diagnose: Assimilating Clinical Narratives using Deep Reinforcement Learning0
Acquiring Target Stacking Skills by Goal-Parameterized Deep Reinforcement Learning0
Paraphrase Generation with Deep Reinforcement Learning0
Regret Minimization for Partially Observable Deep Reinforcement LearningCode0
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement LearningCode0
Visualizing and Understanding Atari AgentsCode0
Backpropagation through the Void: Optimizing control variates for black-box gradient estimationCode0
Automata-Guided Hierarchical Reinforcement Learning for Skill Composition0
Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo0
Exponential improvements for quantum-accessible reinforcement learning0
Action-depedent Control Variates for Policy Optimization via Stein's IdentityCode0
Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning ApproachCode0
Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming0
Sequence-to-Sequence ASR Optimization via Reinforcement Learning0
Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning0
Inverse Reinforcement Learning Under Noisy Observations0
Show:102550
← PrevPage 285 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified